Re: How to setup SimpleFSDirectoryFactory

2012-07-22 Thread geetha anjali
Thanks a lot Uwe, will check out in the new 3.6.1


On Mon, Jul 23, 2012 at 11:46 AM, Uwe Schindler  wrote:

> Hi Geetha Anjali,
>
> Lucene will not use MMapDirectoy by default on 32 bit platforms or if you
> are not using a Oracle/Sun JVM. On 64 bit platforms, Lucene will use it,
> but
> will accept the risks of segfaulting when unmapping the buffers - Lucene
> does try its best to prevent this. It is a risk, but accepted by the Lucene
> developers.
>
> To come back to your issue: It is perfectly fine on Solr/Lucene to not
> unmap
> all buffers as long as the index is open. The number of open file handles
> is
> another discussion, but not related at all to MMap, if you are using an old
> Lucene version (like 3.0.2), you should upgrade in all cases The recent one
> is 3.6.1.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: geetha anjali [mailto:anjaliprabh...@gmail.com]
> > Sent: Monday, July 23, 2012 4:28 AM
> > Subject: Re: How to setup SimpleFSDirectoryFactory
> >
> > Hu Uwe,
> > Thanks Wwe, Have you checked the Bug in JRE for mmapDirectory?. I was
> > mentioning this, This is posted in Oracle site, and the API doc.
> > They accept this as a bug, have you seen this?.
> >
> > "MMapDirectory > u=ene/store/MMapDirectory.html>uses
> > memory-mapped IO when reading. This is a good choice if you have plenty
> of
> > virtual memory relative to your index size, eg if you are running on a 64
> bit JRE,
> > or you are running on a 32 bit JRE but your index sizes are small enough
> to fit
> > into the virtual memory space. Java has currently the limitation of not
> being
> > able to unmap files from user code. The files are unmapped, when GC
> releases
> > the byte buffers. *Due to this
> > bugin
> > Sun's JRE,
> > MMapDirectory's
> >
> **IndexInput.close()*<
> http://lucene.apache.org/java/3_0_2/api/core/org/apac
> > =e/lucene/store/IndexInput.html#close%28%29>
> > * is unable to close the underlying OS file handle. Only when GC finally
> collects
> > the underlying objects, which could be quite some time later, will the
> file
> > handle be closed*. *This will consume additional transient disk
> > usage*: on Windows, attempts to delete or overwrite the files will result
> in an
> > exception; on other platforms, which typically have a "delete on last
> close"
> > semantics, while such operations will succeed, the bytes are still
> consuming
> > space on disk. For many applications this limitation is not a problem
> (e.g. if you
> > have plenty of disk space, and you don't rely on overwriting files on
> Windows)
> > but it's still an important limitation to be aware of. This class
> supplies
> a
> > (possibly dangerous) workaround mentioned in the bug report, which may
> fail
> > on non-Sun JVMs. "
> >
> >
> > Thanks,
> >
> >
> > On Mon, Jul 23, 2012 at 4:13 AM, Uwe Schindler  wrote:
> >
> > > It is hopeless to talk to both of you, you don't understand virtual
> memor=:
> > >
> > > > I get a similar situation using Windows 2008 and Solr 3.6. Memory
> > > > using mmap=is never released. Even if I turn off traffic and commit
> > > > and do =
> > > manual
> > > > gc= If the size of the index is 3gb then memory used will be heap +
> > > > 3=b
> > > of
> > > > sha=ed used. If I use a 6gb index I get heap + 6gb.
> > >
> > > That is expected, but we are talking not about allocated physical
> > > memory, we are talking about allocated ADDRESS SPACE and you have 2^47
> > > of that on 64bit platforms. There is no physical memory wasted or
> > > allocated - please read the blog post a third, forth, fifth... or
> > > tenth time, until it is obvious. Yo= should also go back to school and
> > > take a course on system programming and operating system kernels.
> > > Every CS student gets that taught in his first year (at least in
> > > Germany).
> > >
> > > Java's GC has nothing to do with that - as long as the index is open,
> > > ADDRESS SPACE is assigned. We are talking not about memory nor Java
> > > heap space.
> > >
> > > > If I turn off
> > > > MMapDirectory=actory it goes back down. When is the MMap supposed to
> > > > release memory ? It o=ly does it on JVM restart now.
> > >
> > > Can you please stop spreading nonsense about MMapDirectory with no
> > > knowledge behind? http://www.linuxatemyram.com/ - Also applies to
> > > Windows.
> > >
> > > Uwe
> > >
> > > > Bill Bell
> > > > Sent from mobile
> > > >
> > > >
> > > > On Jul 22, 2012, at 6:21 AM, geetha anjali
> > > >  wrote:=
> > > > > It happens in 3.6, for this reasons I thought of moving to
> solandra.
> > > > > If I do a commit, the all documents are persisted with out any
> > > > > issues= There is no issues  in terms of any functionality, but
> > > > > only this happens i= increase in physical RAM, goes higher and
> > > > > h

RE: How to setup SimpleFSDirectoryFactory

2012-07-22 Thread Uwe Schindler
Hi Geetha Anjali,

Lucene will not use MMapDirectoy by default on 32 bit platforms or if you
are not using a Oracle/Sun JVM. On 64 bit platforms, Lucene will use it, but
will accept the risks of segfaulting when unmapping the buffers - Lucene
does try its best to prevent this. It is a risk, but accepted by the Lucene
developers.

To come back to your issue: It is perfectly fine on Solr/Lucene to not unmap
all buffers as long as the index is open. The number of open file handles is
another discussion, but not related at all to MMap, if you are using an old
Lucene version (like 3.0.2), you should upgrade in all cases The recent one
is 3.6.1.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: geetha anjali [mailto:anjaliprabh...@gmail.com]
> Sent: Monday, July 23, 2012 4:28 AM
> Subject: Re: How to setup SimpleFSDirectoryFactory
> 
> Hu Uwe,
> Thanks Wwe, Have you checked the Bug in JRE for mmapDirectory?. I was
> mentioning this, This is posted in Oracle site, and the API doc.
> They accept this as a bug, have you seen this?.
> 
> "MMapDirectory u=ene/store/MMapDirectory.html>uses
> memory-mapped IO when reading. This is a good choice if you have plenty of
> virtual memory relative to your index size, eg if you are running on a 64
bit JRE,
> or you are running on a 32 bit JRE but your index sizes are small enough
to fit
> into the virtual memory space. Java has currently the limitation of not
being
> able to unmap files from user code. The files are unmapped, when GC
releases
> the byte buffers. *Due to this
> bugin
> Sun's JRE,
> MMapDirectory's
>
**IndexInput.close()* =e/lucene/store/IndexInput.html#close%28%29>
> * is unable to close the underlying OS file handle. Only when GC finally
collects
> the underlying objects, which could be quite some time later, will the
file
> handle be closed*. *This will consume additional transient disk
> usage*: on Windows, attempts to delete or overwrite the files will result
in an
> exception; on other platforms, which typically have a "delete on last
close"
> semantics, while such operations will succeed, the bytes are still
consuming
> space on disk. For many applications this limitation is not a problem
(e.g. if you
> have plenty of disk space, and you don't rely on overwriting files on
Windows)
> but it's still an important limitation to be aware of. This class supplies
a
> (possibly dangerous) workaround mentioned in the bug report, which may
fail
> on non-Sun JVMs. "
> 
> 
> Thanks,
> 
> 
> On Mon, Jul 23, 2012 at 4:13 AM, Uwe Schindler  wrote:
> 
> > It is hopeless to talk to both of you, you don't understand virtual
memor=:
> >
> > > I get a similar situation using Windows 2008 and Solr 3.6. Memory
> > > using mmap=is never released. Even if I turn off traffic and commit
> > > and do =
> > manual
> > > gc= If the size of the index is 3gb then memory used will be heap +
> > > 3=b
> > of
> > > sha=ed used. If I use a 6gb index I get heap + 6gb.
> >
> > That is expected, but we are talking not about allocated physical
> > memory, we are talking about allocated ADDRESS SPACE and you have 2^47
> > of that on 64bit platforms. There is no physical memory wasted or
> > allocated - please read the blog post a third, forth, fifth... or
> > tenth time, until it is obvious. Yo= should also go back to school and
> > take a course on system programming and operating system kernels.
> > Every CS student gets that taught in his first year (at least in
> > Germany).
> >
> > Java's GC has nothing to do with that - as long as the index is open,
> > ADDRESS SPACE is assigned. We are talking not about memory nor Java
> > heap space.
> >
> > > If I turn off
> > > MMapDirectory=actory it goes back down. When is the MMap supposed to
> > > release memory ? It o=ly does it on JVM restart now.
> >
> > Can you please stop spreading nonsense about MMapDirectory with no
> > knowledge behind? http://www.linuxatemyram.com/ - Also applies to
> > Windows.
> >
> > Uwe
> >
> > > Bill Bell
> > > Sent from mobile
> > >
> > >
> > > On Jul 22, 2012, at 6:21 AM, geetha anjali
> > >  wrote:=
> > > > It happens in 3.6, for this reasons I thought of moving to solandra.
> > > > If I do a commit, the all documents are persisted with out any
> > > > issues= There is no issues  in terms of any functionality, but
> > > > only this happens i= increase in physical RAM, goes higher and
> > > > higher and sto= at maximum and i= never comes down.
> > > >
> > > > Thanks
> > > >
> > > > On Sun, Jul 22, 2012 at 3:38 AM, Lance Norskog 
> > > wrote:
> > > >
> > > >> Interesting. Which version of Solr is this? What happens if you
> > > >> do a commit?
> > > >>
> > > >> On Sat, Jul 21, 2012 at 8:01 AM, geetha anjali
> > > =>> wrote:
> > > >>> Hi uwe,
> > 

Re: [Announce] Solr 4.0-ALPHA with RankingAlgorithm 1.4.4 with Realtime NRT available for download

2012-07-22 Thread Nagendra Nagarajayya
Realtime NRT is a NRT implementation available for Solr 1.4.1 to Solr 
4.0. To enable NRT it makes available a NRTIndexReader to the 
IndexSearcher for searching the index.  It does not close the 
SolrIndexSearcher which is a very heavy object with caches, etc. to do 
this. Since the Searcher is never closed it always uses the most recent 
NRTIndexReader for searching and you get a pipe that is always filled 
with new updated documents. The code changes are to handle this dynamic 
pipe that may always have something new as in a realtime system.


Realtime NRT is different from soft commit as it does not close the 
SolrIndexSearcher object every 1000 secs, invalidating the caches, etc. 
SolrIndexSearcher is a very heavy object, ref. counted with caches, etc. 
Closing it every time may turn out to be expensive.


I am contributing Realtime NRT to Solr 4.0 and am working on  making 
available a patch, etc.


Regards,

Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org


On 7/22/2012 2:03 PM, Darren Govoni wrote:

What exactly is "Realtime NRT" (Near Real Time)?

On Sun, 2012-07-22 at 14:07 -0700, Nagendra Nagarajayya wrote:


Hi!

I am very excited to announce the availability of Solr 4.0-ALPHA with
RankingAlgorithm 1.4.4 with Realtime NRT. The Realtime NRT
implementation now supports both RankingAlgorithm and Lucene. Realtime
NRT is a high performance and more granular NRT implementation as to
soft commit. The update performance is about 70,000 documents / sec*.
You can also scale up to 2 billion documents* in a single core, and
query half a billion documents index in ms**.

RankingAlgorithm 1.4.4 supports the entire Lucene Query Syntax, ± and/or
boolean queries and is compatible with the new Lucene 4.0-ALPHA api.

You can get more information about Solr 4.0-ALPHA with RankingAlgorithm
1.4.4 Realtime performance from here:
http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_4.x

You can download Solr 4.0-ALPHA with RankingAlgorithm 1.4.4 from here:
http://solr-ra.tgels.org

Please download and give the new version a try.

Regards,

Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org

* performance seen at a user installation of Solr 4.0 with
RankingAlgorithm 1.4.3
** performance seen when using the age feature








Re: How to setup SimpleFSDirectoryFactory

2012-07-22 Thread geetha anjali
Hu Uwe,
Thanks Wwe, Have you checked the Bug in JRE for mmapDirectory?. I was
mentioning this, This is posted in Oracle site, and the API doc.
They accept this as a bug, have you seen this?.

“MMapDirectoryuses
memory-mapped IO when reading. This is a good choice if you have
plenty of virtual memory relative to your index size, eg if you are running
on a 64 bit JRE, or you are running on a 32 bit JRE but your index sizes
are small enough to fit into the virtual memory space. Java has currently
the limitation of not being able to unmap files from user code. The files
are unmapped, when GC releases the byte buffers. *Due to this
bugin
Sun's JRE,
MMapDirectory's
**IndexInput.close()*
* is unable to close the underlying OS file handle. Only when GC finally
collects the underlying objects, which could be quite some time later, will
the file handle be closed*. *This will consume additional transient disk
usage*: on Windows, attempts to delete or overwrite the files will result
in an exception; on other platforms, which typically have a "delete on last
close" semantics, while such operations will succeed, the bytes are still
consuming space on disk. For many applications this limitation is not a
problem (e.g. if you have plenty of disk space, and you don't rely on
overwriting files on Windows) but it's still an important limitation to be
aware of. This class supplies a (possibly dangerous) workaround mentioned
in the bug report, which may fail on non-Sun JVMs. “


Thanks,


On Mon, Jul 23, 2012 at 4:13 AM, Uwe Schindler  wrote:

> It is hopeless to talk to both of you, you don't understand virtual memory:
>
> > I get a similar situation using Windows 2008 and Solr 3.6. Memory using
> > mmap=is never released. Even if I turn off traffic and commit and do a
> manual
> > gc= If the size of the index is 3gb then memory used will be heap + 3gb
> of
> > sha=ed used. If I use a 6gb index I get heap + 6gb.
>
> That is expected, but we are talking not about allocated physical memory,
> we
> are talking about allocated ADDRESS SPACE and you have 2^47 of that on
> 64bit
> platforms. There is no physical memory wasted or allocated - please read
> the
> blog post a third, forth, fifth... or tenth time, until it is obvious. You
> should also go back to school and take a course on system programming and
> operating system kernels. Every CS student gets that taught in his first
> year (at least in Germany).
>
> Java's GC has nothing to do with that - as long as the index is open,
> ADDRESS SPACE is assigned. We are talking not about memory nor Java heap
> space.
>
> > If I turn off
> > MMapDirectory=actory it goes back down. When is the MMap supposed to
> > release memory ? It o=ly does it on JVM restart now.
>
> Can you please stop spreading nonsense about MMapDirectory with no
> knowledge
> behind? http://www.linuxatemyram.com/ - Also applies to Windows.
>
> Uwe
>
> > Bill Bell
> > Sent from mobile
> >
> >
> > On Jul 22, 2012, at 6:21 AM, geetha anjali 
> > wrote:=
> > > It happens in 3.6, for this reasons I thought of moving to solandra.
> > > If I do a commit, the all documents are persisted with out any issues.
> > > There is no issues  in terms of any functionality, but only this
> > > happens i= increase in physical RAM, goes higher and higher and stop
> > > at maximum and i= never comes down.
> > >
> > > Thanks
> > >
> > > On Sun, Jul 22, 2012 at 3:38 AM, Lance Norskog 
> > wrote:
> > >
> > >> Interesting. Which version of Solr is this? What happens if you do a
> > >> commit?
> > >>
> > >> On Sat, Jul 21, 2012 at 8:01 AM, geetha anjali
> > =>> wrote:
> > >>> Hi uwe,
> > >>> Great to know. We have files indexing 1/min. After 30 mins I see
> > >>> all=>>> my physical memory say its 100 percentage used(windows). On
> > >>> deep investigation found that mmap is not releasing os files handles.
> Do
> > you find this behaviour?
> > >>>
> > >>> Thanks
> > >>>
> > >>> On 20 Jul 2012 14:04, "Uwe Schindler"  wrote:
> > >>>
> > >>> Hi Bill,
> > >>>
> > >>> MMapDirectory uses the file system cache of your operating system,
> > >>> which=>> has following consequences: In Linux, top & free should
> > >>> normally report only=>>> *few* free memory, because the O/S uses all
> > >>> memory not allocated by applications to cache disk I/O (and shows it
> > >>> as allocated, so having 0%
> > >> free
> > >>> memory is just normal on Linux and also Windows). If you have other
> > >>> applications or Lucene/Solr itself that allocate lot's of heap space
> > >>> or
> > >>> malloc() a lot, then you are reducing free physical memory, so
> > >>> reducing
> > >> fs
> > >>> cache. This depends also on your swappiness parameter (if swappiness
> > >>> is higher, inactive processes are swapped out easier, d

RE: How to setup SimpleFSDirectoryFactory

2012-07-22 Thread Uwe Schindler
It is hopeless to talk to both of you, you don't understand virtual memory:

> I get a similar situation using Windows 2008 and Solr 3.6. Memory using
> mmap=is never released. Even if I turn off traffic and commit and do a
manual
> gc= If the size of the index is 3gb then memory used will be heap + 3gb of
> sha=ed used. If I use a 6gb index I get heap + 6gb. 

That is expected, but we are talking not about allocated physical memory, we
are talking about allocated ADDRESS SPACE and you have 2^47 of that on 64bit
platforms. There is no physical memory wasted or allocated - please read the
blog post a third, forth, fifth... or tenth time, until it is obvious. You
should also go back to school and take a course on system programming and
operating system kernels. Every CS student gets that taught in his first
year (at least in Germany).

Java's GC has nothing to do with that - as long as the index is open,
ADDRESS SPACE is assigned. We are talking not about memory nor Java heap
space.

> If I turn off
> MMapDirectory=actory it goes back down. When is the MMap supposed to
> release memory ? It o=ly does it on JVM restart now.

Can you please stop spreading nonsense about MMapDirectory with no knowledge
behind? http://www.linuxatemyram.com/ - Also applies to Windows.

Uwe

> Bill Bell
> Sent from mobile
> 
> 
> On Jul 22, 2012, at 6:21 AM, geetha anjali 
> wrote:=
> > It happens in 3.6, for this reasons I thought of moving to solandra.
> > If I do a commit, the all documents are persisted with out any issues.
> > There is no issues  in terms of any functionality, but only this
> > happens i= increase in physical RAM, goes higher and higher and stop
> > at maximum and i= never comes down.
> >
> > Thanks
> >
> > On Sun, Jul 22, 2012 at 3:38 AM, Lance Norskog 
> wrote:
> >
> >> Interesting. Which version of Solr is this? What happens if you do a
> >> commit?
> >>
> >> On Sat, Jul 21, 2012 at 8:01 AM, geetha anjali
> =>> wrote:
> >>> Hi uwe,
> >>> Great to know. We have files indexing 1/min. After 30 mins I see
> >>> all=>>> my physical memory say its 100 percentage used(windows). On
> >>> deep investigation found that mmap is not releasing os files handles.
Do
> you find this behaviour?
> >>>
> >>> Thanks
> >>>
> >>> On 20 Jul 2012 14:04, "Uwe Schindler"  wrote:
> >>>
> >>> Hi Bill,
> >>>
> >>> MMapDirectory uses the file system cache of your operating system,
> >>> which=>> has following consequences: In Linux, top & free should
> >>> normally report only=>>> *few* free memory, because the O/S uses all
> >>> memory not allocated by applications to cache disk I/O (and shows it
> >>> as allocated, so having 0%
> >> free
> >>> memory is just normal on Linux and also Windows). If you have other
> >>> applications or Lucene/Solr itself that allocate lot's of heap space
> >>> or
> >>> malloc() a lot, then you are reducing free physical memory, so
> >>> reducing
> >> fs
> >>> cache. This depends also on your swappiness parameter (if swappiness
> >>> is higher, inactive processes are swapped out easier, default is 60%
> >>> on
> >> linux -
> >>> freeing more space for FS cache - the backside is of course that
> >>> maybe in-memory structures of Lucene and other applications get pages
> out).
> >>>
> >>> You will only see no paging at all if all memory allocated all
> >> applications
> >>> + all mmapped files fit into memory. But paging in/out the mmapped
> >>> + Lucen=
> >>> index is much cheaper than using SimpleFSDirectory or
> >> NIOFSDirectory. If
> >>> you use SimpleFS or NIO and your index is not in FS cache, it will
> >>> also
> >> read
> >>> it from physical disk again, so where is the difference. Paging is
> >> actually
> >>> cheaper as no syscalls are involved.
> >>>
> >>> If you want as much as possible of your index in physical RAM, copy
> >>> it t= /dev/null regularily and buy more RUM :-)
> >>>
> >>>
> >>> -
> >>> Uwe Schindler
> >>> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> >>> eMail: uwe@thetaphi...
> >>>
>  From: Bill Bell [mailto:billnb...@gmail.com]
>  Sent: Friday, July 20, 2012 5:17 AM
>  Subject: Re: ...
>  s=op using it? The least used memory will be removed from the OS
>  automaticall=? Isee some paging. Wouldn't paging slow down the
> queryi=g?
> >>>
> 
>  My index is 10gb and every 8 hours we get most of it in shared
memory.
> >> The
>  m=mory is 99 percent used, and that does not leave any room for
>  other=>>> apps. =
> >>>
>  Other implications?
> 
>  Sent from my mobile device
>  720-256-8076
> 
>  On Jul 19, 2012, at 9:49 A...
>  H=ap space or free system RAM:
> >>>
> >
> >
> >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.ht
> >> m
> > l
> >
> > Uwe
> > ...
> >> use i= since you might run out of memory on large indexes right?
> >>>
> >>
> >> Here is how I got iSimpleFSDirectoryFactory to work. Just set -
> >> Dsolr.directoryFactor...
>

Re: "Invalid or unreadable WAR file : .../solr.war" when starting solr 3.6.1 app on Tomcat 7?

2012-07-22 Thread k9157

On Sun, Jul 22, 2012, at 02:08 PM, Jon Sharp wrote:
> /srv/www sounds like a doc root for a web server...

It's a simple directory.

It's not configured as doc root for my web server.


Re: "Invalid or unreadable WAR file : .../solr.war" when starting solr 3.6.1 app on Tomcat 7?

2012-07-22 Thread Jon Sharp
/srv/www sounds like a doc root for a web server...



On Jul 22, 2012, at 1:24 PM, k9...@operamail.com wrote:

> 
> I've installed
> 
>rpm -qa | grep -i ^tomcat-7
>tomcat-7.0.27-7.1.noarch
> 
> with
> 
>update-alternatives --query java | grep Value
>Value: /usr/lib64/jvm/jre-1.7.0-openjdk/bin/java
> 
> on
>GNU/Linux
>x86_64
>kernel 3.1.10
> 
> Tomcat is started & listening @ 127.0.0.1
> 
>netstat -pan --tcp | grep 8080
>tcp0  0 127.0.0.1:8080  0.0.0.0:*   
>   LISTEN  29513/java
> 
> @
> 
>http://localhost:8080/
> 
> I see
> 
>Apache Tomcat/7.0.27
>If you're seeing this, you've successfully installed Tomcat.
>Congratulations!
>...
> 
> Deploying SOLR 3.6.1
> 
>cd /usr/local/apache-solr-3.6.1
>/bin/cp -Rf ./example/solr/*   /srv/www/solr/home
>/bin/cp -f  ./dist/apache-solr-3.6.1.war  
>/srv/www/solr/home/solr.war
>/bin/cp -f  ./example/solr/conf/schema.xml
>/srv/www/solr/home/conf/
> 
> then, define solr/home
> 
>cat /etc/tomcat/Catalina/localhost/solr.xml
>privileged="true" allowLinking="true"
>crossContext="true" >
>value="/srv/www/solr/home" override="true" />
>
> 
> and reference it,
> 
>grep dataDir /srv/www/solr/home/conf/solrconfig.xml
>  ${solr.data.dir:/srv/www/solr/home/data}
> 
> restart tomcat, then @:
> 
>http://localhost:8080/manager/html
> 
> lists the deployed "/solr" app as NOT running,
> 
>Path   Version Display Name   
>Running Sessions
>/  None specified  Welcome to Tomcat  
>true0
>/docs  None specified  Tomcat Documentation   
>true0
>/examples  None specified  Servlet and JSP Examples   
>true0
>/host-manager  None specified  Tomcat Host Manager Application
>true1
>/manager   None specified  Tomcat Manager Application 
>true1
>/sampleNone specified  Hello, World Application   
>true0
>/solr  None specified 
>false   0
> 
> clicking "start" @ the "/solr" app path link returns,
> 
> @ browser,
> 
>HTTP Status 404 - /solr
>type Status report
>message /solr
>description The requested resource (/solr) is not available.
>Apache Tomcat/7.0.27
> 
> & @ logs:
> 
> 
>==> /var/log/tomcat/manager.2012-07-22.log <==
>Jul 22, 2012 12:03:14 PM
>org.apache.catalina.core.ApplicationContext log
>INFO: HTMLManager: start: Starting web application '/solr'
> 
>==> /var/log/tomcat/catalina.2012-07-22.log <==
>Jul 22, 2012 12:03:14 PM
>org.apache.catalina.core.StandardContext resourcesStart
>SEVERE: Error starting static Resources
>java.lang.IllegalArgumentException: Invalid or unreadable WAR
>file : /srv/www/solr/home/solr.war
>at
>
> org.apache.naming.resources.WARDirContext.setDocBase(WARDirContext.java:136)
>at
>
> org.apache.catalina.core.StandardContext.resourcesStart(StandardContext.java:4894)
>at
>
> org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5074)
>at
>
> org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
>at
>
> org.apache.catalina.manager.ManagerServlet.start(ManagerServlet.java:1247)
>at
>
> org.apache.catalina.manager.HTMLManagerServlet.start(HTMLManagerServlet.java:747)
>at
>
> org.apache.catalina.manager.HTMLManagerServlet.doPost(HTMLManagerServlet.java:222)
>at
>javax.servlet.http.HttpServlet.service(HttpServlet.java:641)
>at
>javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
>at
>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
>at
>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>at
>
> org.apache.catalina.filters.CsrfPreventionFilter.doFilter(CsrfPreventionFilter.java:186)
>at
>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>at
>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>at
>
> org.apache.catalina.filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:108)
>at
>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>at
>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>at
>
> org.apache.catalina.core.

Re: Index relational XML with DataImportHandler

2012-07-22 Thread Tobias Berg
Ok, problem found by digging in the source code. If it is a bug or "works
by design" I don't know but the reason is when the translation of the
vaiable ${store.id} is made.

The translation is made in the method initXpathReader() with these lines:

>   String xpath = field.get(XPATH);
>   *xpath = context.replaceTokens(xpath);*
>   xpathReader.addField(field.get(DataImporter.COLUMN),
>   xpath,
>
> Boolean.parseBoolean(field.get(DataImporter.MULTI_VALUED)),
>   flags);
> }


The line  *xpath = context.replaceTokens(xpath); *translates the variable
to it's actual value. initXpathReader() is called in the init() method but
is *only *called once for each entity definition:

  if (xpathReader == null)
>   initXpathReader();


This means that the first time initXpathReader() is called, ${store.id} is
translated to 0102 (the first id of the store). When the next store id is
encountered, the xpathReader is already initialized so initXpathReader() is
not called, thus the xpath expression is not updated with the new store id.

There is a bunch of other things happening in the initXpathReader so I'm
not sure if it's safe to just remove the null-check. But, looking at the
SQLEntityProcessor, the translation of the variables in the query string is
performed in the getRow() method, and not in the init method so I think
that the null-check should either be removed or that the xpath expression
translation should be moved so it is performed each time.

/Tobias

2012/7/22 Tobias Berg 

> The articleId field is the only field in the correlation file so I just
> need to get that one working.
>
> I tried butting the condition in the forEach secion. If I hardcode a
> value, like 0104, it works but it doesn't work with the variable. Haven't
> looked at the sourcecode yet but maybe forEach doesn't support variables?
> That could be a nice patch :)
>
> I thought about $skipDoc but can't figure out how I want to use it, since
> I want to add the field, it's just that it picks the wrong value. Do you
> have something in mind in how to use it for my use-case?
>
> I'll take a look at the source code to see if it can be a bug.
>
> /Tobias
>
> 2012/7/22 Alexandre Rafalovitch 
>
>> I am still struggling with nested DIH myself, but I notice that your
>> correlation condition is on the field level (@StoreId='${store.id}).
>> Were you planning to repeat it for each field definition?
>>
>> Have you tried putting it instead in the forEach section?
>>
>> Alternatively, maybe you need to use $skipDoc as in the Wikipedia
>> import example?
>>
>> Regards,
>>Alex.
>> Personal blog: http://blog.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)
>>
>>
>> On Sat, Jul 21, 2012 at 1:34 PM, Tobias Berg 
>> wrote:
>> > Hi,
>> >
>> > I'm trying to index a set of stores and their articles. I have two
>> > XML-files, one that contains the data of the stores and one that
>> contains
>> > articles for each store. I'm using DIH with XPathEntityProcessor to
>> process
>> > the file containing the store, and using a nested entity I try to get
>> all
>> > articles that belongs to the specific store. The problem I encounter is
>> > that every store gets the same articles.
>> >
>> > For testing purposes I've stripped down the xml-files to only include
>> id:s
>> > for testing purposes. The store file (StoresTest.xml) looks like this:
>> >
>> > 
>> >
>> 01020104
>> >
>> > The Store-Articles relations file (StoreArticlesTest.xml) looks like
>> this:
>> > > > StoreId="0102">18004> >
>> StoreId="0104">1700410004
>> >
>> > And my dih-config file looks like this:
>> >
>> > 
>> > 
>> > 
>> >> > processor="XPathEntityProcessor"
>> > stream="true"
>> > forEach="/Stores/Store"
>> > url="../../../data/StoresTest.xml"
>> > transformer="TemplateTransformer"
>> >>
>> > 
>> > > > processor="XPathEntityProcessor"
>> > stream="true"
>> > forEach="/StoreArticles"
>> > url="../../../data/StoreArticlesTest.xml"
>> > transformer="LogTransformer"
>> > logTemplate="Processing ${store.id}" logLevel="info"
>> > rootEntity="true">
>> >  > xpath="/StoreArticles/Store[@StoreId='${
>> > store.id}']/ArticleId" />
>> > 
>> >
>> > 
>> > 
>> >
>> > The result I get in Solr is this:
>> >
>> > 
>> > ...
>> > 
>> > 
>> > 0102
>> > 
>> > 18004
>> > 
>> > 
>> > 
>> > 0104
>> > 
>> > 18004
>> > 
>> > 
>> > 
>> > 
>> >
>> > As you see, both stores gets the article for the first store. I would
>> have
>> > expected the second store to have two articles: 17004 and 10004.
>> >
>> > In the log messages printed using LogTransformer I see that each
>> > store.idis processed but somehow it only picks up the articles for the
>> > first store.
>> >
>> > Any ideas?
>> >
>> > /Tobias Berg
>>
>
>


Re: [Announce] Solr 4.0-ALPHA with RankingAlgorithm 1.4.4 with Realtime NRT available for download

2012-07-22 Thread Darren Govoni
What exactly is "Realtime NRT" (Near Real Time)?

On Sun, 2012-07-22 at 14:07 -0700, Nagendra Nagarajayya wrote:

> Hi!
> 
> I am very excited to announce the availability of Solr 4.0-ALPHA with 
> RankingAlgorithm 1.4.4 with Realtime NRT. The Realtime NRT 
> implementation now supports both RankingAlgorithm and Lucene. Realtime 
> NRT is a high performance and more granular NRT implementation as to 
> soft commit. The update performance is about 70,000 documents / sec*. 
> You can also scale up to 2 billion documents* in a single core, and 
> query half a billion documents index in ms**.
> 
> RankingAlgorithm 1.4.4 supports the entire Lucene Query Syntax, ± and/or 
> boolean queries and is compatible with the new Lucene 4.0-ALPHA api.
> 
> You can get more information about Solr 4.0-ALPHA with RankingAlgorithm 
> 1.4.4 Realtime performance from here:
> http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_4.x
> 
> You can download Solr 4.0-ALPHA with RankingAlgorithm 1.4.4 from here:
> http://solr-ra.tgels.org
> 
> Please download and give the new version a try.
> 
> Regards,
> 
> Nagendra Nagarajayya
> http://solr-ra.tgels.org
> http://rankingalgorithm.tgels.org
> 
> * performance seen at a user installation of Solr 4.0 with 
> RankingAlgorithm 1.4.3
> ** performance seen when using the age feature
> 




[Announce] Solr 4.0-ALPHA with RankingAlgorithm 1.4.4 with Realtime NRT available for download

2012-07-22 Thread Nagendra Nagarajayya

Hi!

I am very excited to announce the availability of Solr 4.0-ALPHA with 
RankingAlgorithm 1.4.4 with Realtime NRT. The Realtime NRT 
implementation now supports both RankingAlgorithm and Lucene. Realtime 
NRT is a high performance and more granular NRT implementation as to 
soft commit. The update performance is about 70,000 documents / sec*. 
You can also scale up to 2 billion documents* in a single core, and 
query half a billion documents index in ms**.


RankingAlgorithm 1.4.4 supports the entire Lucene Query Syntax, ± and/or 
boolean queries and is compatible with the new Lucene 4.0-ALPHA api.


You can get more information about Solr 4.0-ALPHA with RankingAlgorithm 
1.4.4 Realtime performance from here:

http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_4.x

You can download Solr 4.0-ALPHA with RankingAlgorithm 1.4.4 from here:
http://solr-ra.tgels.org

Please download and give the new version a try.

Regards,

Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org

* performance seen at a user installation of Solr 4.0 with 
RankingAlgorithm 1.4.3

** performance seen when using the age feature



Re: How to setup SimpleFSDirectoryFactory

2012-07-22 Thread Bill Bell
I get a similar situation using Windows 2008 and Solr 3.6. Memory using mmap is 
never released. Even if I turn off traffic and commit and do a manual gc. If 
the size of the index is 3gb then memory used will be heap + 3gb of shared 
used. If I use a 6gb index I get heap + 6gb. If I turn off MMapDirectoryFactory 
it goes back down. When is the MMap supposed to release memory ? It only does 
it on JVM restart now.

Bill Bell
Sent from mobile


On Jul 22, 2012, at 6:21 AM, geetha anjali  wrote:

> It happens in 3.6, for this reasons I thought of moving to solandra.
> If I do a commit, the all documents are persisted with out any issues.
> There is no issues  in terms of any functionality, but only this happens is
> increase in physical RAM, goes higher and higher and stop at maximum and it
> never comes down.
> 
> Thanks
> 
> On Sun, Jul 22, 2012 at 3:38 AM, Lance Norskog  wrote:
> 
>> Interesting. Which version of Solr is this? What happens if you do a
>> commit?
>> 
>> On Sat, Jul 21, 2012 at 8:01 AM, geetha anjali 
>> wrote:
>>> Hi uwe,
>>> Great to know. We have files indexing 1/min. After 30 mins I see all
>>> my physical memory say its 100 percentage used(windows). On deep
>>> investigation found that mmap is not releasing os files handles. Do you
>>> find this behaviour?
>>> 
>>> Thanks
>>> 
>>> On 20 Jul 2012 14:04, "Uwe Schindler"  wrote:
>>> 
>>> Hi Bill,
>>> 
>>> MMapDirectory uses the file system cache of your operating system, which
>> has
>>> following consequences: In Linux, top & free should normally report only
>>> *few* free memory, because the O/S uses all memory not allocated by
>>> applications to cache disk I/O (and shows it as allocated, so having 0%
>> free
>>> memory is just normal on Linux and also Windows). If you have other
>>> applications or Lucene/Solr itself that allocate lot's of heap space or
>>> malloc() a lot, then you are reducing free physical memory, so reducing
>> fs
>>> cache. This depends also on your swappiness parameter (if swappiness is
>>> higher, inactive processes are swapped out easier, default is 60% on
>> linux -
>>> freeing more space for FS cache - the backside is of course that maybe
>>> in-memory structures of Lucene and other applications get pages out).
>>> 
>>> You will only see no paging at all if all memory allocated all
>> applications
>>> + all mmapped files fit into memory. But paging in/out the mmapped Lucene
>>> index is much cheaper than using SimpleFSDirectory or
>> NIOFSDirectory. If
>>> you use SimpleFS or NIO and your index is not in FS cache, it will also
>> read
>>> it from physical disk again, so where is the difference. Paging is
>> actually
>>> cheaper as no syscalls are involved.
>>> 
>>> If you want as much as possible of your index in physical RAM, copy it to
>>> /dev/null regularily and buy more RUM :-)
>>> 
>>> 
>>> -
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi...
>>> 
 From: Bill Bell [mailto:billnb...@gmail.com]
 Sent: Friday, July 20, 2012 5:17 AM
 Subject: Re: ...
 s=op using it? The least used memory will be removed from the OS
 automaticall=? Isee some paging. Wouldn't paging slow down the querying?
>>> 
 
 My index is 10gb and every 8 hours we get most of it in shared memory.
>> The
 m=mory is 99 percent used, and that does not leave any room for other
>>> apps. =
>>> 
 Other implications?
 
 Sent from my mobile device
 720-256-8076
 
 On Jul 19, 2012, at 9:49 A...
 H=ap space or free system RAM:
>>> 
> 
> 
>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.htm
> l
> 
> Uwe
> ...
>> use i= since you might run out of memory on large indexes right?
>>> 
>> 
>> Here is how I got iSimpleFSDirectoryFactory to work. Just set -
>> Dsolr.directoryFactor...
>> set it=all up with a helper in solrconfig.xml...
>>> 
>> 
>> if (Constants.WINDOWS) {
>> if (MMapDirectory.UNMAP_SUPPORTED && Constants.JRE_IS_64...
>> 
>> 
>> 
>> --
>> Lance Norskog
>> goks...@gmail.com
>> 


"Invalid or unreadable WAR file : .../solr.war" when starting solr 3.6.1 app on Tomcat 7?

2012-07-22 Thread k9157

I've installed

rpm -qa | grep -i ^tomcat-7
tomcat-7.0.27-7.1.noarch

with

update-alternatives --query java | grep Value
Value: /usr/lib64/jvm/jre-1.7.0-openjdk/bin/java

on
GNU/Linux
x86_64
kernel 3.1.10

Tomcat is started & listening @ 127.0.0.1

netstat -pan --tcp | grep 8080
tcp0  0 127.0.0.1:8080  0.0.0.0:*   
   LISTEN  29513/java

@

http://localhost:8080/

I see

Apache Tomcat/7.0.27
If you're seeing this, you've successfully installed Tomcat.
Congratulations!
...

Deploying SOLR 3.6.1

cd /usr/local/apache-solr-3.6.1
/bin/cp -Rf ./example/solr/*   /srv/www/solr/home
/bin/cp -f  ./dist/apache-solr-3.6.1.war  
/srv/www/solr/home/solr.war
/bin/cp -f  ./example/solr/conf/schema.xml
/srv/www/solr/home/conf/

then, define solr/home

cat /etc/tomcat/Catalina/localhost/solr.xml

  


and reference it,

grep dataDir /srv/www/solr/home/conf/solrconfig.xml
  ${solr.data.dir:/srv/www/solr/home/data}

restart tomcat, then @:

http://localhost:8080/manager/html

lists the deployed "/solr" app as NOT running,

Path   Version Display Name   
Running Sessions
/  None specified  Welcome to Tomcat  
true0
/docs  None specified  Tomcat Documentation   
true0
/examples  None specified  Servlet and JSP Examples   
true0
/host-manager  None specified  Tomcat Host Manager Application
true1
/manager   None specified  Tomcat Manager Application 
true1
/sampleNone specified  Hello, World Application   
true0
/solr  None specified 
false   0

clicking "start" @ the "/solr" app path link returns,

@ browser,

HTTP Status 404 - /solr
type Status report
message /solr
description The requested resource (/solr) is not available.
Apache Tomcat/7.0.27

& @ logs:


==> /var/log/tomcat/manager.2012-07-22.log <==
Jul 22, 2012 12:03:14 PM
org.apache.catalina.core.ApplicationContext log
INFO: HTMLManager: start: Starting web application '/solr'

==> /var/log/tomcat/catalina.2012-07-22.log <==
Jul 22, 2012 12:03:14 PM
org.apache.catalina.core.StandardContext resourcesStart
SEVERE: Error starting static Resources
java.lang.IllegalArgumentException: Invalid or unreadable WAR
file : /srv/www/solr/home/solr.war
at

org.apache.naming.resources.WARDirContext.setDocBase(WARDirContext.java:136)
at

org.apache.catalina.core.StandardContext.resourcesStart(StandardContext.java:4894)
at

org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5074)
at

org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at

org.apache.catalina.manager.ManagerServlet.start(ManagerServlet.java:1247)
at

org.apache.catalina.manager.HTMLManagerServlet.start(HTMLManagerServlet.java:747)
at

org.apache.catalina.manager.HTMLManagerServlet.doPost(HTMLManagerServlet.java:222)
at
javax.servlet.http.HttpServlet.service(HttpServlet.java:641)
at
javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
at

org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
at

org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at

org.apache.catalina.filters.CsrfPreventionFilter.doFilter(CsrfPreventionFilter.java:186)
at

org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at

org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at

org.apache.catalina.filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:108)
at

org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at

org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at

org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrappe

Solr performance

2012-07-22 Thread John
Hi,

I have an index of about 50m documents. the fields in this index are
basically hierarchical tokens: token1, token2 token10
When searching the index, I start by getting a list of the query tokens
(1..10) and then requesting the documents that suit those query tokens.

I always want about 1000 results back, therefore I don't query for all
search tokens. My solution right now is to check with rows=0 what tokens
combination gives me 1000 results +/- 200, and then fetching them. But this
means up to 10 queries to Solr. So I query with token1:x, then with
token1:x AND token2:y, then with token1:x AND token2:y AND token3:z

I have 3 questions on this regard:
1. when searching token1:x and token2:y, solr actually intersects millions
of documents. Will it work faster if I query token12:xy?
2. for the last 1000 results I run them through a function query, does
function query works similarly to custom scoring? which one is faster?
3. is there a better solution for my problem?

Thanks in advance,
John


Re: How to Increase the number of connexion on Solr/Tomcat6?

2012-07-22 Thread Michael Della Bitta
Bonne chance!

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Sun, Jul 22, 2012 at 6:38 AM, Bruno Mannina  wrote:
> Hi Michael,
>
> I unsinstall Tomcat6, java, etc... and re-install all packages...I will see
> if it's ok with a new install
>
> I will keep inform, thx !!
>
> Le 21/07/2012 17:05, Michael Della Bitta a écrit :
>
>> Yeah, that's Tomcat's memory leak detector. Technically that's a
>> memory leak, but in practice it won't really amount to much.
>>
>> I'm surprised there are no errors related to your empty response
>> problem in the logs. That is strange, and might point to a problem
>> with your Tomcat install. Perhaps your instinct to use Jetty was the
>> right one after all.
>>
>> Michael Della Bitta
>>
>> 
>> Appinions, Inc. -- Where Influence Isn’t a Game.
>> http://www.appinions.com
>>
>>
>> On Fri, Jul 20, 2012 at 6:36 PM, Bruno Mannina  wrote:
>>>
>>> In the catalina.out, I have only these few rows with:
>>>
>>> .
>>> INFO: Closing Searcher@1faa614 main
>>>
>>> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>>> 15 juil. 2012 13:51:31 org.apache.catalina.loader.WebappClassLoader
>>> clearThreadLocalMap
>>> GRAVE: The web application [/solr] created a ThreadLocal with key of type
>>> [org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
>>> [org.apache.solr.schema.DateField$ThreadLocalDateFormat@75a744]) and a
>>> value
>>> of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
>>> (value
>>> [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
>>> but
>>> failed to remove it when the web application was stopped. This is very
>>> likely to create a memory leak.
>>> 15 juil. 2012 13:51:31 org.apache.catalina.loader.WebappClassLoader
>>> clearThreadLocalMap
>>> GRAVE: The web application [/solr] created a ThreadLocal with key of type
>>> [org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
>>> [org.apache.solr.schema.DateField$ThreadLocalDateFormat@75a744]) and a
>>> value
>>> of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
>>> (value
>>> [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
>>> but
>>> failed to remove it when the web application was stopped. This is very
>>> likely to create a memory leak.
>>> 15 juil. 2012 13:51:31 org.apache.catalina.loader.WebappClassLoader
>>> clearThreadLocalMap
>>> GRAVE: The web application [/solr] created a ThreadLocal with key of type
>>> [org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
>>> [org.apache.solr.schema.DateField$ThreadLocalDateFormat@75a744]) and a
>>> value
>>> of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
>>> (value
>>> [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
>>> but
>>> failed to remove it when the web application was stopped. This is very
>>> likely to create a memory leak.
>>> 15 juil. 2012 13:51:31 org.apache.catalina.loader.WebappClassLoader
>>> clearThreadLocalMap
>>> GRAVE: The web application [/solr] created a ThreadLocal with key of type
>>> [org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
>>> [org.apache.solr.schema.DateField$ThreadLocalDateFormat@75a744]) and a
>>> value
>>> of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
>>> (value
>>> [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
>>> but
>>> failed to remove it when the web application was stopped. This is very
>>> likely to create a memory leak.
>>> 15 juil. 2012 13:51:31 org.apache.coyote.http11.Http11Protocol destroy
>>> INFO: Arrêt de Coyote HTTP/1.1 sur http-8983
>>> 15 juil. 2012 13:54:29 org.apache.catalina.startup.ClassLoaderFactory
>>> validateFile
>>> ATTENTION: Problem with directory [/usr/share/tomcat6/server/classes],
>>> exists: [false], isDirectory: [false], canRead: [false]
>>> 15 juil. 2012 13:54:29 org.apache.catalina.startup.ClassLoaderFactory
>>> validateFile
>>> ATTENTION: Problem with directory [/usr/share/tomcat6/server], exists:
>>> [false], isDirectory: [false], canRead: [false]
>>> 15 juil. 2012 13:54:29 org.apache.catalina.startup.ClassLoaderFactory
>>> validateFile
>>> ATTENTION: Problem with directory [/usr/share/tomcat6/shared/classes],
>>> exists: [false], isDirectory: [false], canRead: [false]
>>> 15 juil. 2012 13:54:29 org.apache.catalina.startup.ClassLoaderFactory
>>> validateFile
>>> ATTENTION: Problem with directory [/usr/share/tomcat6/shared], exists:
>>> [false], isDirectory: [false], canRead: [false]
>>> 15 juil. 2012 13:54:29 org.apache.coyote.http11.Http11Protocol init
>>> INFO: Initialisation de Coyote HTTP/1.1 sur http-8983
>>> ...
>>> ...
>>> ...
>>>
>>> Le 21/07/2012 00:04, Bruno Mannina a écrit :
>>>
>>

Re: using Solr to search for names

2012-07-22 Thread Alireza Salimi
It's almost what I've been doing, but I didn't write my own filter,
I used SynonymFilterFactory.

Thanks

On Sun, Jul 22, 2012 at 12:45 PM, Roman Chyla  wrote:

> Or for names that are more involved, you can use special
> tokenizer/filter chain and index different variants of the name into
> one index
>
> example:
> https://github.com/romanchyla/montysolr/blob/solr-trunk/contrib/adsabs/src/java/org/apache/lucene/analysis/synonym/AuthorSynonymFilter.java
>
> roman
>
> On Sun, Jul 22, 2012 at 10:52 AM, Alireza Salimi
>  wrote:
> > Hi Ahmet,
> >
> > Thanks for the reply, Yes, actually after I posted the first question,
> > I found that edismax is very helpful in this use case. There is another
> > problem which is about hyphens in the search query.
> >
> > I guess I need to post it in another email.
> >
> > Thank you very much
> >
> > On Sun, Jul 22, 2012 at 3:35 AM, Ahmet Arslan  wrote:
> >
> >> > So here is the problem, I have a requirement to implement
> >> > search by a
> >> > person name.
> >> > Names consist of
> >> > - first name
> >> >  - middle name
> >> > - last name
> >> > - nickname
> >> >
> >> > there is a list of synonyms which should be applied just for
> >> > first name and
> >> > middle name.
> >> >
> >> > In search, all fields should be searched for the search
> >> > keyword. That's why
> >> > I thought
> >> > maybe having an aggregate field - named 'name' - which keeps
> >> > all fields - by
> >> > copyField tag - can be used for search.
> >> >
> >> > The problem is: how can I apply synonyms for first names and
> >> > middle names,
> >> > when I
> >> > want to copy them into 'name' field?
> >> >
> >> > If you know of any link which is for using Solr to search
> >> > for names,
> >> > I would appreciate if you let me know.
> >>
> >> There is a flexible approach when you want to search over multiple
> fields
> >> having different field types.
> http://wiki.apache.org/solr/ExtendedDisMax
> >> You just specify the list of fields by qf parameter.
> >>
> >> &defType=edismax&qf=firstName^1.2 middleName lastName^1.5 nickname
> >>
> >
> >
> >
> > --
> > Alireza Salimi
> > Java EE Developer
>



-- 
Alireza Salimi
Java EE Developer


Re: using Solr to search for names

2012-07-22 Thread Roman Chyla
Or for names that are more involved, you can use special
tokenizer/filter chain and index different variants of the name into
one index

example: 
https://github.com/romanchyla/montysolr/blob/solr-trunk/contrib/adsabs/src/java/org/apache/lucene/analysis/synonym/AuthorSynonymFilter.java

roman

On Sun, Jul 22, 2012 at 10:52 AM, Alireza Salimi
 wrote:
> Hi Ahmet,
>
> Thanks for the reply, Yes, actually after I posted the first question,
> I found that edismax is very helpful in this use case. There is another
> problem which is about hyphens in the search query.
>
> I guess I need to post it in another email.
>
> Thank you very much
>
> On Sun, Jul 22, 2012 at 3:35 AM, Ahmet Arslan  wrote:
>
>> > So here is the problem, I have a requirement to implement
>> > search by a
>> > person name.
>> > Names consist of
>> > - first name
>> >  - middle name
>> > - last name
>> > - nickname
>> >
>> > there is a list of synonyms which should be applied just for
>> > first name and
>> > middle name.
>> >
>> > In search, all fields should be searched for the search
>> > keyword. That's why
>> > I thought
>> > maybe having an aggregate field - named 'name' - which keeps
>> > all fields - by
>> > copyField tag - can be used for search.
>> >
>> > The problem is: how can I apply synonyms for first names and
>> > middle names,
>> > when I
>> > want to copy them into 'name' field?
>> >
>> > If you know of any link which is for using Solr to search
>> > for names,
>> > I would appreciate if you let me know.
>>
>> There is a flexible approach when you want to search over multiple fields
>> having different field types. http://wiki.apache.org/solr/ExtendedDisMax
>> You just specify the list of fields by qf parameter.
>>
>> &defType=edismax&qf=firstName^1.2 middleName lastName^1.5 nickname
>>
>
>
>
> --
> Alireza Salimi
> Java EE Developer


Re: Redirecting SolrQueryRequests to another core with Handler

2012-07-22 Thread Nicholas Ball

Hey Erick,

Managed to do this in the end by reconstructing a new SolrQueryRequest
with a SolrRequestParsers (method buildRequestFrom()) and then calling
core.execute();
Took some fiddling but seems to be working now! :)

Thanks for the help!
Nick

On Sun, 22 Jul 2012 10:58:16 -0400, Erick Erickson
 wrote:
> Haven't done this in code myself, but take a look at
> MutlCoreJettyExampleTest and the associated base
> class, that might give you some pointers
> 
> Best
> Erick
> 
> On Thu, Jul 19, 2012 at 9:35 PM, Nicholas Ball
>  wrote:
>>
>> What is the best way to redirect a SolrQueryRequest to another core
from
>> within a handler (custom SearchHandler)?
>>
>> I've tried to find the SolrCore of the core I want to redirect to and
>> called the execute() method with the same params but it looks like the
>> SolrQueryRequest object already has the old core name embedded into it!
I
>> want to do this without making a new request and going through the
>> servlet
>> etc...
>>
>> * Note that I had to have an empty core with a special name just to do
>> this redirection process in the first place, if there is a better way
to
>> proceed with this please let me know too :)
>>
>> Many thanks for any help you can give,
>> Nicholas (incunix)


Re: Index relational XML with DataImportHandler

2012-07-22 Thread Tobias Berg
The articleId field is the only field in the correlation file so I just
need to get that one working.

I tried butting the condition in the forEach secion. If I hardcode a value,
like 0104, it works but it doesn't work with the variable. Haven't looked
at the sourcecode yet but maybe forEach doesn't support variables? That
could be a nice patch :)

I thought about $skipDoc but can't figure out how I want to use it, since I
want to add the field, it's just that it picks the wrong value. Do you have
something in mind in how to use it for my use-case?

I'll take a look at the source code to see if it can be a bug.

/Tobias

2012/7/22 Alexandre Rafalovitch 

> I am still struggling with nested DIH myself, but I notice that your
> correlation condition is on the field level (@StoreId='${store.id}).
> Were you planning to repeat it for each field definition?
>
> Have you tried putting it instead in the forEach section?
>
> Alternatively, maybe you need to use $skipDoc as in the Wikipedia
> import example?
>
> Regards,
>Alex.
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Sat, Jul 21, 2012 at 1:34 PM, Tobias Berg 
> wrote:
> > Hi,
> >
> > I'm trying to index a set of stores and their articles. I have two
> > XML-files, one that contains the data of the stores and one that contains
> > articles for each store. I'm using DIH with XPathEntityProcessor to
> process
> > the file containing the store, and using a nested entity I try to get all
> > articles that belongs to the specific store. The problem I encounter is
> > that every store gets the same articles.
> >
> > For testing purposes I've stripped down the xml-files to only include
> id:s
> > for testing purposes. The store file (StoresTest.xml) looks like this:
> >
> > 
> > 01020104
> >
> > The Store-Articles relations file (StoreArticlesTest.xml) looks like
> this:
> >  > StoreId="0102">18004 >
> StoreId="0104">1700410004
> >
> > And my dih-config file looks like this:
> >
> > 
> > 
> > 
> > > processor="XPathEntityProcessor"
> > stream="true"
> > forEach="/Stores/Store"
> > url="../../../data/StoresTest.xml"
> > transformer="TemplateTransformer"
> >>
> > 
> >  > processor="XPathEntityProcessor"
> > stream="true"
> > forEach="/StoreArticles"
> > url="../../../data/StoreArticlesTest.xml"
> > transformer="LogTransformer"
> > logTemplate="Processing ${store.id}" logLevel="info"
> > rootEntity="true">
> >   xpath="/StoreArticles/Store[@StoreId='${
> > store.id}']/ArticleId" />
> > 
> >
> > 
> > 
> >
> > The result I get in Solr is this:
> >
> > 
> > ...
> > 
> > 
> > 0102
> > 
> > 18004
> > 
> > 
> > 
> > 0104
> > 
> > 18004
> > 
> > 
> > 
> > 
> >
> > As you see, both stores gets the article for the first store. I would
> have
> > expected the second store to have two articles: 17004 and 10004.
> >
> > In the log messages printed using LogTransformer I see that each
> > store.idis processed but somehow it only picks up the articles for the
> > first store.
> >
> > Any ideas?
> >
> > /Tobias Berg
>


Re: custom sorter

2012-07-22 Thread Erick Erickson
Wait by using filter queries with *:*, you're essentially
disabling scoring. *:*
resolves to ConstantScore Query, and filter queries don't lend any
scoring at all.

It really sounds like you're shooting yourself in the foot by using *:*, what
happens if you use q= instead? QEV can be used in this case.

You can certainly define your own search component, but maybe you
can re-think the approach? Or is sorting _really_ the thing you want
to determine the results list, ignoring scoring all together?

Of course I may be completely off in the weeds.

Best
Erick

On Sun, Jul 22, 2012 at 10:59 AM, Siping Liu  wrote:
> Hi -- thanks for the response. It's the right direction. However on closer
> look I don't think I can use it directly. The reason is that in my case,
> the query string is always "*:*", we use filter query to get different
> results. When fq=(field1:"xyz") we want to boost one document and let sort=
> to take care of the rest results, and when field1 has other value, sort=
> takes care of all results.
>
> Maybe I can define my own SearchComponent class, and specify it in
> 
>   my_search_component
> 
> I have to try and see if that'd work.
>
> thanks.
>
>
> On Fri, Jul 20, 2012 at 3:24 AM, Lee Carroll
> wrote:
>
>> take a look at
>> http://wiki.apache.org/solr/QueryElevationComponent
>>
>> On 20 July 2012 03:48, Siping Liu  wrote:
>>
>> > Hi,
>> > I have requirements to place a document to a pre-determined  position for
>> > special filter query values, for instance when filter query is
>> > fq=(field1:"xyz") place document abc as first result (the rest of the
>> > result set will be ordered by sort=field2). I guess I have to plug in my
>> > Java code as a custom sorter. I'd appreciate it if someone can shed light
>> > on this (how to add custom sorter, etc.)
>> > TIA.
>> >
>>


Re: custom sorter

2012-07-22 Thread Siping Liu
Hi -- thanks for the response. It's the right direction. However on closer
look I don't think I can use it directly. The reason is that in my case,
the query string is always "*:*", we use filter query to get different
results. When fq=(field1:"xyz") we want to boost one document and let sort=
to take care of the rest results, and when field1 has other value, sort=
takes care of all results.

Maybe I can define my own SearchComponent class, and specify it in

  my_search_component

I have to try and see if that'd work.

thanks.


On Fri, Jul 20, 2012 at 3:24 AM, Lee Carroll
wrote:

> take a look at
> http://wiki.apache.org/solr/QueryElevationComponent
>
> On 20 July 2012 03:48, Siping Liu  wrote:
>
> > Hi,
> > I have requirements to place a document to a pre-determined  position for
> > special filter query values, for instance when filter query is
> > fq=(field1:"xyz") place document abc as first result (the rest of the
> > result set will be ordered by sort=field2). I guess I have to plug in my
> > Java code as a custom sorter. I'd appreciate it if someone can shed light
> > on this (how to add custom sorter, etc.)
> > TIA.
> >
>


Re: Redirecting SolrQueryRequests to another core with Handler

2012-07-22 Thread Erick Erickson
Haven't done this in code myself, but take a look at
MutlCoreJettyExampleTest and the associated base
class, that might give you some pointers

Best
Erick

On Thu, Jul 19, 2012 at 9:35 PM, Nicholas Ball
 wrote:
>
> What is the best way to redirect a SolrQueryRequest to another core from
> within a handler (custom SearchHandler)?
>
> I've tried to find the SolrCore of the core I want to redirect to and
> called the execute() method with the same params but it looks like the
> SolrQueryRequest object already has the old core name embedded into it! I
> want to do this without making a new request and going through the servlet
> etc...
>
> * Note that I had to have an empty core with a special name just to do
> this redirection process in the first place, if there is a better way to
> proceed with this please let me know too :)
>
> Many thanks for any help you can give,
> Nicholas (incunix)


Re: using Solr to search for names

2012-07-22 Thread Alireza Salimi
Hi Ahmet,

Thanks for the reply, Yes, actually after I posted the first question,
I found that edismax is very helpful in this use case. There is another
problem which is about hyphens in the search query.

I guess I need to post it in another email.

Thank you very much

On Sun, Jul 22, 2012 at 3:35 AM, Ahmet Arslan  wrote:

> > So here is the problem, I have a requirement to implement
> > search by a
> > person name.
> > Names consist of
> > - first name
> >  - middle name
> > - last name
> > - nickname
> >
> > there is a list of synonyms which should be applied just for
> > first name and
> > middle name.
> >
> > In search, all fields should be searched for the search
> > keyword. That's why
> > I thought
> > maybe having an aggregate field - named 'name' - which keeps
> > all fields - by
> > copyField tag - can be used for search.
> >
> > The problem is: how can I apply synonyms for first names and
> > middle names,
> > when I
> > want to copy them into 'name' field?
> >
> > If you know of any link which is for using Solr to search
> > for names,
> > I would appreciate if you let me know.
>
> There is a flexible approach when you want to search over multiple fields
> having different field types. http://wiki.apache.org/solr/ExtendedDisMax
> You just specify the list of fields by qf parameter.
>
> &defType=edismax&qf=firstName^1.2 middleName lastName^1.5 nickname
>



-- 
Alireza Salimi
Java EE Developer


Re: Index relational XML with DataImportHandler

2012-07-22 Thread Alexandre Rafalovitch
I am still struggling with nested DIH myself, but I notice that your
correlation condition is on the field level (@StoreId='${store.id}).
Were you planning to repeat it for each field definition?

Have you tried putting it instead in the forEach section?

Alternatively, maybe you need to use $skipDoc as in the Wikipedia
import example?

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Sat, Jul 21, 2012 at 1:34 PM, Tobias Berg  wrote:
> Hi,
>
> I'm trying to index a set of stores and their articles. I have two
> XML-files, one that contains the data of the stores and one that contains
> articles for each store. I'm using DIH with XPathEntityProcessor to process
> the file containing the store, and using a nested entity I try to get all
> articles that belongs to the specific store. The problem I encounter is
> that every store gets the same articles.
>
> For testing purposes I've stripped down the xml-files to only include id:s
> for testing purposes. The store file (StoresTest.xml) looks like this:
>
> 
> 01020104
>
> The Store-Articles relations file (StoreArticlesTest.xml) looks like this:
>  StoreId="0102">18004 StoreId="0104">1700410004
>
> And my dih-config file looks like this:
>
> 
> 
> 
> processor="XPathEntityProcessor"
> stream="true"
> forEach="/Stores/Store"
> url="../../../data/StoresTest.xml"
> transformer="TemplateTransformer"
>>
> 
>  processor="XPathEntityProcessor"
> stream="true"
> forEach="/StoreArticles"
> url="../../../data/StoreArticlesTest.xml"
> transformer="LogTransformer"
> logTemplate="Processing ${store.id}" logLevel="info"
> rootEntity="true">
>  
> 
>
> 
> 
>
> The result I get in Solr is this:
>
> 
> ...
> 
> 
> 0102
> 
> 18004
> 
> 
> 
> 0104
> 
> 18004
> 
> 
> 
> 
>
> As you see, both stores gets the article for the first store. I would have
> expected the second store to have two articles: 17004 and 10004.
>
> In the log messages printed using LogTransformer I see that each
> store.idis processed but somehow it only picks up the articles for the
> first store.
>
> Any ideas?
>
> /Tobias Berg


RE: RE: How to setup SimpleFSDirectoryFactory

2012-07-22 Thread Uwe Schindler
Hi,

It seems that both of you simply don't understand what's happening in your
operating system kernel. Please read the blog post again!

> It happens in 3.6, for this reasons I thought of moving to solandra.
> If I do a commit, the all documents are persisted with out any issues.
> There is no issues  in terms of any functionality, but only this happens
is
> increase in physical RAM, goes higher and higher and stop at maximum and
it
> never comes down.

This is perfectly fine in Windows and Linux (and any other operating
system). If an operating system would not use *all* available physical
memory it would waste costly hardware resources. Why not use resources that
are unused otherwise? As said before:

O/S kernel uses *all* available physical RAM for caching file system
accesses. The memory used for that is always reported as not free, because
it is used (very simple, right?). But if some other application wants to use
it, its free for malloc(), so it is not permanently occupied. That's always
that case, using MMapDirectory or not (same for SimpleFSDirectory or
NIOFSDirectory).

Of course, when you freshly booted your kernel, it reports free memory, but
definitely not on a server running 24/7 since weeks.

For all people who don't want to understand that, here is the easy
explanation page:
http://www.linuxatemyram.com/

> > > all my physical memory say its 100 percentage used(windows). On deep
> > > investigation found that mmap is not releasing os files handles. Do
> > > you find this behaviour?

One comment: The file handles are not freed as long as the index is open.
Used file handles have nothing to do with memory mapping, it's completely
unrelated to each other.

Uwe

> On Sun, Jul 22, 2012 at 3:38 AM, Lance Norskog  wrote:
> 
> > Interesting. Which version of Solr is this? What happens if you do a
> > commit?
> >
> > On Sat, Jul 21, 2012 at 8:01 AM, geetha anjali
> > 
> > wrote:
> > > Hi uwe,
> > > Great to know. We have files indexing 1/min. After 30 mins I see
> > > all my physical memory say its 100 percentage used(windows). On deep
> > > investigation found that mmap is not releasing os files handles. Do
> > > you find this behaviour?
> > >
> > > Thanks
> > >
> > > On 20 Jul 2012 14:04, "Uwe Schindler"  wrote:
> > >
> > > Hi Bill,
> > >
> > > MMapDirectory uses the file system cache of your operating system,
> > > which
> > has
> > > following consequences: In Linux, top & free should normally report
> > > only
> > > *few* free memory, because the O/S uses all memory not allocated by
> > > applications to cache disk I/O (and shows it as allocated, so having
> > > 0%
> > free
> > > memory is just normal on Linux and also Windows). If you have other
> > > applications or Lucene/Solr itself that allocate lot's of heap space
> > > or
> > > malloc() a lot, then you are reducing free physical memory, so
> > > reducing
> > fs
> > > cache. This depends also on your swappiness parameter (if swappiness
> > > is higher, inactive processes are swapped out easier, default is 60%
> > > on
> > linux -
> > > freeing more space for FS cache - the backside is of course that
> > > maybe in-memory structures of Lucene and other applications get pages
> out).
> > >
> > > You will only see no paging at all if all memory allocated all
> > applications
> > > + all mmapped files fit into memory. But paging in/out the mmapped
> > > + Lucene
> > > index is much cheaper than using SimpleFSDirectory or
> > NIOFSDirectory. If
> > > you use SimpleFS or NIO and your index is not in FS cache, it will
> > > also
> > read
> > > it from physical disk again, so where is the difference. Paging is
> > actually
> > > cheaper as no syscalls are involved.
> > >
> > > If you want as much as possible of your index in physical RAM, copy
> > > it to /dev/null regularily and buy more RUM :-)
> > >
> > >
> > > -
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> > > eMail: uwe@thetaphi...
> > >
> > >> From: Bill Bell [mailto:billnb...@gmail.com]
> > >> Sent: Friday, July 20, 2012 5:17 AM
> > >> Subject: Re: ...
> > >> s=op using it? The least used memory will be removed from the OS
> > >> automaticall=? Isee some paging. Wouldn't paging slow down the
> querying?
> > >
> > >>
> > >> My index is 10gb and every 8 hours we get most of it in shared
memory.
> > The
> > >> m=mory is 99 percent used, and that does not leave any room for
> > >> other
> > > apps. =
> > >
> > >> Other implications?
> > >>
> > >> Sent from my mobile device
> > >> 720-256-8076
> > >>
> > >> On Jul 19, 2012, at 9:49 A...
> > >> H=ap space or free system RAM:
> > >
> > >> >
> > >> >
> > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.htm
> > >> > l
> > >> >
> > >> > Uwe
> > >> >...
> > >> >> use i= since you might run out of memory on large indexes right?
> > >
> > >> >>
> > >> >> Here is how I got iSimpleFSDirectoryFactory to work. Just set -
> > >> >> Dsolr.directoryFactor...
> > >> >> set it=all up wit

[ANNOUNCE] Apache Solr 3.6.1 released

2012-07-22 Thread Uwe Schindler
22 July 2012, Apache SolrT 3.6.1 available

The Lucene PMC is pleased to announce the release of Apache Solr 3.6.1.

Solr is the popular, blazing fast open source enterprise search platform
from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search, dynamic clustering, database
integration, rich document (e.g., Word, PDF) handling, and geospatial
search.
Solr is highly scalable, providing distributed search and index replication,
and it powers the search and navigation features of many of the world's
largest internet sites.

This release is a bug fix release for version 3.6.0. It contains numerous
bug fixes, optimizations, and improvements, some of which are highlighted
below.  The release is available for immediate download at:
   http://lucene.apache.org/solr/mirrors-solr-3x-redir.html (see
note below).

See the CHANGES.txt file included with the release for a full list of
details.

Solr 3.6.1 Release Highlights:

 * The concurrency of MMapDirectory was improved, which caused
   a performance regression in comparison to Solr 3.5.0. This affected
   users with 64bit platforms (Linux, Solaris, Windows) or those
   explicitely using MMapDirectoryFactory.

 * ReplicationHandler "maxNumberOfBackups" was fixed to work if backups are
   triggered on commit.

 * Charset problems were fixed with HttpSolrServer, caused by an upgrade to
   a new Commons HttpClient version in 3.6.0.

 * Grouping was fixed to return correct count when not all shards are
   queried in the second pass. Solr no longer throws Exception when using
   result grouping with main=true and using wt=javabin.

 * Config file replication was made less error prone.

 * Data Import Handler threading fixes.

 * Various minor bugs were fixed.

Note: The Apache Software Foundation uses an extensive mirroring network for
distributing releases.  It is possible that the mirror you are using may not
have replicated the release yet.  If that is the case, please try another
mirror.  This also goes for Maven access.

Happy searching,

Uwe Schindler (release manager)
& all Lucene/Solr developers

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/





Re: Index relational XML with DataImportHandler

2012-07-22 Thread Tobias Berg
My uniqeKey in scema.xml is id. I've tried adding pk="id" to the store
entity but it makes no difference.

The result is the same if I set rootEntity="false" on the store entity.
However I added debug and verbose output to the dataimporthandler and I
noticed a slight change in how the nested queries are executed. Below is
with rootEntity="true":


...
...
full-import
debug




../../../data/StoresTest.xml
0:0:0.1
--- row #1-
0102
/Stores/Store
-

../../../data/StoreArticlesTest.xml
../../../data/StoreArticlesTest.xml
0:0:0.1
0:0:0.1
--- row #1-

18004

/StoreArticles
-

-

18004

/StoreArticles
-




--- row #1-
0104
/Stores/Store
-

../../../data/StoreArticlesTest.xml
../../../data/StoreArticlesTest.xml
../../../data/StoreArticlesTest.xml
../../../data/StoreArticlesTest.xml
0:0:0.0
0:0:0.0
0:0:0.0
0:0:0.0
--- row #1-

18004

/StoreArticles
-

-

18004

/StoreArticles
-






idle
Configuration Re-loaded sucessfully
...
...


And with rootEntity="false":



0
40



import-test-articles-config.xml


full-import
debug



../../../data/StoresTest.xml
../../../data/StoresTest.xml
0:0:0.10
0:0:0.10
--- row #1-
0102
/Stores/Store
-


../../../data/StoreArticlesTest.xml
0:0:0.0
--- row #1-

18004

/StoreArticles
-

-

18004

/StoreArticles
-




--- row #2-
0104
/Stores/Store
-


../../../data/StoreArticlesTest.xml
../../../data/StoreArticlesTest.xml
0:0:0.0
0:0:0.0
--- row #1-

18004

/StoreArticles
-

-

18004

/StoreArticles
-






idle
Configuration Re-loaded sucessfully
...
...


I'm not very familiar with the verbose output but it seems like with
rootEntity="true", one query is made to retrieve the stores and then two,
and four queries are made to the nested store-article. With
rootEntity="false", two queries are made to retrieve the stores and then
one, and two queries are made to the nested store-article. It seems odd
that both these cases produces multiple queries for the second store, but
maybe that's expected?

Anyway, althought the queries differs, the result is the same.

/Tobias

2012/7/22 Ahmet Arslan 

> > I'm trying to index a set of stores and their articles. I
> > have two
> > XML-files, one that contains the data of the stores and one
> > that contains
> > articles for each store. I'm using DIH with
> > XPathEntityProcessor to process
> > the file containing the store, and using a nested entity I
> > try to get all
> > articles that belongs to the specific store. The problem I
> > encounter is
> > that every store gets the same articles.
> >
> > For testing purposes I've stripped down the xml-files to
> > only include id:s
> > for testing purposes. The store file (StoresTest.xml) looks
> > like this:
> >
> > 
> > 01020104
> >
> > The Store-Articles relations file (StoreArticlesTest.xml)
> > looks like this:
> >  > encoding="utf-8"?> > StoreId="0102">18004 >
> StoreId="0104">1700410004
> >
> > And my dih-config file looks like this:
> >
> > 
> >  > type="FileDataSource" encoding="UTF-8" />
> > 
> > > processor="XPathEntityProcessor"
> > stream="true"
> > forEach="/Stores/Store"
> > url="../../../data/StoresTest.xml"
> > transformer="TemplateTransformer"
> > >
> > 
> >  > processor="XPathEntityProcessor"
> > stream="true"
> > forEach="/StoreArticles"
> > url="../../../data/StoreArticlesTest.xml"
> > transformer="LogTransformer"
> > logTemplate="Processing ${store.id}" logLevel="info"
> > rootEntity="true">
> >   > xpath="/StoreArticles/Store[@StoreId='${
> > store.id}']/ArticleId" />
> > 
> >
> > 
> > 
> >
> > The result I get in Solr is this:
> >
> > 
> > ...
> > 
> > 
> > 0102
> > 
> > 18004
> > 
> > 
> > 
> > 0104
> > 
> > 18004
> > 
> > 
> > 
> > 
> >
> > As you see, both stores gets the article for the first
> > store. I would have
> > expected the second store to have two articles: 17004 and
> > 10004.
> >
> > In the log messages printed using LogTransformer I see that
> > each
> > store.idis processed but somehow it only picks up the
> > articles for the
> > first store.
> >
> > Any ideas?
>
> What happens when you set  What is your uniqueKey in schema.xml?
>


Re: RE: How to setup SimpleFSDirectoryFactory

2012-07-22 Thread geetha anjali
It happens in 3.6, for this reasons I thought of moving to solandra.
If I do a commit, the all documents are persisted with out any issues.
There is no issues  in terms of any functionality, but only this happens is
increase in physical RAM, goes higher and higher and stop at maximum and it
never comes down.

Thanks

On Sun, Jul 22, 2012 at 3:38 AM, Lance Norskog  wrote:

> Interesting. Which version of Solr is this? What happens if you do a
> commit?
>
> On Sat, Jul 21, 2012 at 8:01 AM, geetha anjali 
> wrote:
> > Hi uwe,
> > Great to know. We have files indexing 1/min. After 30 mins I see all
> > my physical memory say its 100 percentage used(windows). On deep
> > investigation found that mmap is not releasing os files handles. Do you
> > find this behaviour?
> >
> > Thanks
> >
> > On 20 Jul 2012 14:04, "Uwe Schindler"  wrote:
> >
> > Hi Bill,
> >
> > MMapDirectory uses the file system cache of your operating system, which
> has
> > following consequences: In Linux, top & free should normally report only
> > *few* free memory, because the O/S uses all memory not allocated by
> > applications to cache disk I/O (and shows it as allocated, so having 0%
> free
> > memory is just normal on Linux and also Windows). If you have other
> > applications or Lucene/Solr itself that allocate lot's of heap space or
> > malloc() a lot, then you are reducing free physical memory, so reducing
> fs
> > cache. This depends also on your swappiness parameter (if swappiness is
> > higher, inactive processes are swapped out easier, default is 60% on
> linux -
> > freeing more space for FS cache - the backside is of course that maybe
> > in-memory structures of Lucene and other applications get pages out).
> >
> > You will only see no paging at all if all memory allocated all
> applications
> > + all mmapped files fit into memory. But paging in/out the mmapped Lucene
> > index is much cheaper than using SimpleFSDirectory or
> NIOFSDirectory. If
> > you use SimpleFS or NIO and your index is not in FS cache, it will also
> read
> > it from physical disk again, so where is the difference. Paging is
> actually
> > cheaper as no syscalls are involved.
> >
> > If you want as much as possible of your index in physical RAM, copy it to
> > /dev/null regularily and buy more RUM :-)
> >
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi...
> >
> >> From: Bill Bell [mailto:billnb...@gmail.com]
> >> Sent: Friday, July 20, 2012 5:17 AM
> >> Subject: Re: ...
> >> s=op using it? The least used memory will be removed from the OS
> >> automaticall=? Isee some paging. Wouldn't paging slow down the querying?
> >
> >>
> >> My index is 10gb and every 8 hours we get most of it in shared memory.
> The
> >> m=mory is 99 percent used, and that does not leave any room for other
> > apps. =
> >
> >> Other implications?
> >>
> >> Sent from my mobile device
> >> 720-256-8076
> >>
> >> On Jul 19, 2012, at 9:49 A...
> >> H=ap space or free system RAM:
> >
> >> >
> >> >
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.htm
> >> > l
> >> >
> >> > Uwe
> >> >...
> >> >> use i= since you might run out of memory on large indexes right?
> >
> >> >>
> >> >> Here is how I got iSimpleFSDirectoryFactory to work. Just set -
> >> >> Dsolr.directoryFactor...
> >> >> set it=all up with a helper in solrconfig.xml...
> >
> >> >>
> >> >> if (Constants.WINDOWS) {
> >> >> if (MMapDirectory.UNMAP_SUPPORTED && Constants.JRE_IS_64...
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Re: How to Increase the number of connexion on Solr/Tomcat6?

2012-07-22 Thread Bruno Mannina

Hi Michael,

I unsinstall Tomcat6, java, etc... and re-install all packages...I will 
see if it's ok with a new install


I will keep inform, thx !!

Le 21/07/2012 17:05, Michael Della Bitta a écrit :

Yeah, that's Tomcat's memory leak detector. Technically that's a
memory leak, but in practice it won't really amount to much.

I'm surprised there are no errors related to your empty response
problem in the logs. That is strange, and might point to a problem
with your Tomcat install. Perhaps your instinct to use Jetty was the
right one after all.

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Fri, Jul 20, 2012 at 6:36 PM, Bruno Mannina  wrote:

In the catalina.out, I have only these few rows with:

.
INFO: Closing Searcher@1faa614 main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
15 juil. 2012 13:51:31 org.apache.catalina.loader.WebappClassLoader
clearThreadLocalMap
GRAVE: The web application [/solr] created a ThreadLocal with key of type
[org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
[org.apache.solr.schema.DateField$ThreadLocalDateFormat@75a744]) and a value
of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat] (value
[org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a]) but
failed to remove it when the web application was stopped. This is very
likely to create a memory leak.
15 juil. 2012 13:51:31 org.apache.catalina.loader.WebappClassLoader
clearThreadLocalMap
GRAVE: The web application [/solr] created a ThreadLocal with key of type
[org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
[org.apache.solr.schema.DateField$ThreadLocalDateFormat@75a744]) and a value
of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat] (value
[org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a]) but
failed to remove it when the web application was stopped. This is very
likely to create a memory leak.
15 juil. 2012 13:51:31 org.apache.catalina.loader.WebappClassLoader
clearThreadLocalMap
GRAVE: The web application [/solr] created a ThreadLocal with key of type
[org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
[org.apache.solr.schema.DateField$ThreadLocalDateFormat@75a744]) and a value
of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat] (value
[org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a]) but
failed to remove it when the web application was stopped. This is very
likely to create a memory leak.
15 juil. 2012 13:51:31 org.apache.catalina.loader.WebappClassLoader
clearThreadLocalMap
GRAVE: The web application [/solr] created a ThreadLocal with key of type
[org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
[org.apache.solr.schema.DateField$ThreadLocalDateFormat@75a744]) and a value
of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat] (value
[org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a]) but
failed to remove it when the web application was stopped. This is very
likely to create a memory leak.
15 juil. 2012 13:51:31 org.apache.coyote.http11.Http11Protocol destroy
INFO: Arrêt de Coyote HTTP/1.1 sur http-8983
15 juil. 2012 13:54:29 org.apache.catalina.startup.ClassLoaderFactory
validateFile
ATTENTION: Problem with directory [/usr/share/tomcat6/server/classes],
exists: [false], isDirectory: [false], canRead: [false]
15 juil. 2012 13:54:29 org.apache.catalina.startup.ClassLoaderFactory
validateFile
ATTENTION: Problem with directory [/usr/share/tomcat6/server], exists:
[false], isDirectory: [false], canRead: [false]
15 juil. 2012 13:54:29 org.apache.catalina.startup.ClassLoaderFactory
validateFile
ATTENTION: Problem with directory [/usr/share/tomcat6/shared/classes],
exists: [false], isDirectory: [false], canRead: [false]
15 juil. 2012 13:54:29 org.apache.catalina.startup.ClassLoaderFactory
validateFile
ATTENTION: Problem with directory [/usr/share/tomcat6/shared], exists:
[false], isDirectory: [false], canRead: [false]
15 juil. 2012 13:54:29 org.apache.coyote.http11.Http11Protocol init
INFO: Initialisation de Coyote HTTP/1.1 sur http-8983
...
...
...

Le 21/07/2012 00:04, Bruno Mannina a écrit :


Le 21/07/2012 00:02, Bruno Mannina a écrit :

Le 21/07/2012 00:00, Bruno Mannina a écrit :

catalinat.out <-- twice

Sorry concerning this file, I do a
sudo cat .. |more and it's ok I see the content


And inside the catalina.out I have all my requests, without error or
missing requests

:'( it's amazing








Re: using Solr to search for names

2012-07-22 Thread Ahmet Arslan
> So here is the problem, I have a requirement to implement
> search by a
> person name.
> Names consist of
> - first name
>  - middle name
> - last name
> - nickname
> 
> there is a list of synonyms which should be applied just for
> first name and
> middle name.
> 
> In search, all fields should be searched for the search
> keyword. That's why
> I thought
> maybe having an aggregate field - named 'name' - which keeps
> all fields - by
> copyField tag - can be used for search.
> 
> The problem is: how can I apply synonyms for first names and
> middle names,
> when I
> want to copy them into 'name' field?
> 
> If you know of any link which is for using Solr to search
> for names,
> I would appreciate if you let me know.

There is a flexible approach when you want to search over multiple fields 
having different field types. http://wiki.apache.org/solr/ExtendedDisMax
You just specify the list of fields by qf parameter. 

&defType=edismax&qf=firstName^1.2 middleName lastName^1.5 nickname


Re: Index relational XML with DataImportHandler

2012-07-22 Thread Ahmet Arslan
> I'm trying to index a set of stores and their articles. I
> have two
> XML-files, one that contains the data of the stores and one
> that contains
> articles for each store. I'm using DIH with
> XPathEntityProcessor to process
> the file containing the store, and using a nested entity I
> try to get all
> articles that belongs to the specific store. The problem I
> encounter is
> that every store gets the same articles.
> 
> For testing purposes I've stripped down the xml-files to
> only include id:s
> for testing purposes. The store file (StoresTest.xml) looks
> like this:
> 
> 
> 01020104
> 
> The Store-Articles relations file (StoreArticlesTest.xml)
> looks like this:
>  encoding="utf-8"?> StoreId="0102">18004 StoreId="0104">1700410004
> 
> And my dih-config file looks like this:
> 
> 
>          type="FileDataSource" encoding="UTF-8" />
>         
>     processor="XPathEntityProcessor"
> stream="true"
> forEach="/Stores/Store"
> url="../../../data/StoresTest.xml"
> transformer="TemplateTransformer"
> >
> 
>  processor="XPathEntityProcessor"
> stream="true"
> forEach="/StoreArticles"
> url="../../../data/StoreArticlesTest.xml"
> transformer="LogTransformer"
> logTemplate="Processing ${store.id}" logLevel="info"
> rootEntity="true">
>   xpath="/StoreArticles/Store[@StoreId='${
> store.id}']/ArticleId" />
> 
>    
> 
> 
> 
> The result I get in Solr is this:
> 
> 
> ...
> 
> 
> 0102
> 
> 18004
> 
> 
> 
> 0104
> 
> 18004
> 
> 
> 
> 
> 
> As you see, both stores gets the article for the first
> store. I would have
> expected the second store to have two articles: 17004 and
> 10004.
> 
> In the log messages printed using LogTransformer I see that
> each
> store.idis processed but somehow it only picks up the
> articles for the
> first store.
> 
> Any ideas?

What happens when you set