Any way to get top 'n' queries searched from Solr?

2010-04-30 Thread Praveen Agrawal
Hi,
I need to know what are the top (most frequently searched and their
frequencies) 'n' (say 100) search queries that users tried. Does Solr keep
this information and can return, or else what options do i have here?
Thanks,
Praveen


Re: ubuntu lucid package

2010-04-30 Thread Gora Mohanty
On Thu, 29 Apr 2010 19:54:49 -0700 (PDT)
Otis Gospodnetic otis_gospodne...@yahoo.com wrote:

 Pablo, Ubuntu Lucid is *brand* new :)
 
 try:
 find / -name \*solr\*
 or 
 locate solr.war
[...]

Also, the standard Debian/Ubuntu way of finding out what files a
package installed is:
  dpkg -l pkg_name

Regards,
Gora


AW: Slow Date-Range Queries

2010-04-30 Thread Jan Simon Winkelmann
For now I need them, I will however most likely (as suggested by Ahmet Arslan), 
create another boolean field to get rid of them, just simply due to the fact 
that I am switching to Solr 1.4 frange queries.

On the topic of frange queries, is there a way to simulate the date range 
wildcards here? They don't seem to be working for the frange.

 Do you really need the *:* stuff in the date range subqueries? That
 may add to the execution time.


Re: Any way to get top 'n' queries searched from Solr?

2010-04-30 Thread Peter Sturge
As far as I'm aware, this information isn't stored intrinsically in Solr.

We had a similar requirement whereby we need to keep track of which searches
have been performed by particular users.
This is more of a security audit requirement rather than generic searching,
but the solution was to audit (in a SearchComponent) all users' search
activity. This auditing can then be written back to the index (or, perhaps
more preferably, a separate index), which can then be searched in the normal
way.

You could adopt the same strategy for your requirement. If you want to see
how we did this, have a look at SOLR-1872.

Thanks,
Peter




On Fri, Apr 30, 2010 at 7:14 AM, Praveen Agrawal pkal...@gmail.com wrote:

 Hi,
 I need to know what are the top (most frequently searched and their
 frequencies) 'n' (say 100) search queries that users tried. Does Solr keep
 this information and can return, or else what options do i have here?
 Thanks,
 Praveen



Re: ubuntu lucid package

2010-04-30 Thread pablo platt
http://localhost:8080/solr/admin/ gives me the solr admin.
thanks

On Fri, Apr 30, 2010 at 10:24 AM, Gora Mohanty g...@srijan.in wrote:

 On Thu, 29 Apr 2010 19:54:49 -0700 (PDT)
 Otis Gospodnetic otis_gospodne...@yahoo.com wrote:

  Pablo, Ubuntu Lucid is *brand* new :)
 
  try:
  find / -name \*solr\*
  or
  locate solr.war
 [...]

 Also, the standard Debian/Ubuntu way of finding out what files a
 package installed is:
  dpkg -l pkg_name

 Regards,
 Gora



RE: Problem with pdf, upgrading Cell

2010-04-30 Thread pk

Mark,
did you managed to get it work?

I did try latest Tika (0.7) command line and successfully parsed earlier
problematic pdf. Then i replaced Tika related jars in Solr-1.4
contrib/extraction/lib folder with new ones. Now it doesn;t throw any
exception, but no content extraction, only metadata! It now even doesn't
extract content from pdfs which it was able to earlier (v0.4). Strange..

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-pdf-upgrading-Cell-tp745557p767447.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Problem with pdf, upgrading Cell

2010-04-30 Thread Sandhya Agarwal
I observed the same issue too, with tika 0.7 jars. It now fails to extract 
content from documents of any type. Works with tika 0.5 though.

Thanks,
Sandhya

-Original Message-
From: pk [mailto:pkal...@gmail.com] 
Sent: Friday, April 30, 2010 3:17 PM
To: solr-user@lucene.apache.org
Subject: RE: Problem with pdf, upgrading Cell


Mark,
did you managed to get it work?

I did try latest Tika (0.7) command line and successfully parsed earlier
problematic pdf. Then i replaced Tika related jars in Solr-1.4
contrib/extraction/lib folder with new ones. Now it doesn;t throw any
exception, but no content extraction, only metadata! It now even doesn't
extract content from pdfs which it was able to earlier (v0.4). Strange..

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-pdf-upgrading-Cell-tp745557p767447.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Any way to get top 'n' queries searched from Solr?

2010-04-30 Thread pk

Peter, 
It seems that your solution (SOLR-1872) requires authentication too (and be
tracked via ur uuid), but my users will be general public using browsers,
and i can't force any such auth restrictions. Also you didn't mention if you
are already persisting the audit data.. Or i may need to extend it to work
for my problem..

My requirement is simple: to know top n query strings with their frequencies
etc..
Thanks though.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Any-way-to-get-top-n-queries-searched-from-Solr-tp767165p767482.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Any way to get top 'n' queries searched from Solr?

2010-04-30 Thread MitchK

The most simple way is to send the querystring to your Solr-client *and* to
your custom query-fetcher, which could be any database you like. Doing so,
you can count how often which query was send etc.
*And* you can make them searchable by exporting those datasets to another
Solr-core.
Why  an extra DB?
Because if there occurs a crash, you got no guaranties given by Solr. Keep
in mind that Solr is only an index-search-server, not a real database.

This is the pretty easiest way to implement such a feature, I think.

Good luck.
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Any-way-to-get-top-n-queries-searched-from-Solr-tp767165p767489.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Any way to get top 'n' queries searched from Solr?

2010-04-30 Thread Abdelhamid ABID
Hi,
Why you don't just create a filter in the solr context, by this way you can
grasp user q param and persist it.

On 4/30/10, pk pkal...@gmail.com wrote:


 Peter,
 It seems that your solution (SOLR-1872) requires authentication too (and be
 tracked via ur uuid), but my users will be general public using browsers,
 and i can't force any such auth restrictions. Also you didn't mention if
 you
 are already persisting the audit data.. Or i may need to extend it to work
 for my problem..

 My requirement is simple: to know top n query strings with their
 frequencies
 etc..
 Thanks though.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Any-way-to-get-top-n-queries-searched-from-Solr-tp767165p767482.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Abdelhamid ABID
Software Engineer- J2EE / WEB


Re: Any way to get top 'n' queries searched from Solr?

2010-04-30 Thread Praveen Agrawal
Thanks Mitch..
I've an application fronting the Solr for updating/searching etc, and i'll
make use of that to store this info.

Thanks to all for suggestions.


On Fri, Apr 30, 2010 at 3:43 PM, MitchK mitc...@web.de wrote:


 The most simple way is to send the querystring to your Solr-client *and* to
 your custom query-fetcher, which could be any database you like. Doing so,
 you can count how often which query was send etc.
 *And* you can make them searchable by exporting those datasets to another
 Solr-core.
 Why  an extra DB?
 Because if there occurs a crash, you got no guaranties given by Solr. Keep
 in mind that Solr is only an index-search-server, not a real database.

 This is the pretty easiest way to implement such a feature, I think.

 Good luck.
 - Mitch
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Any-way-to-get-top-n-queries-searched-from-Solr-tp767165p767489.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Any way to get top 'n' queries searched from Solr?

2010-04-30 Thread Peter Sturge
Yes, you're right, SOLR-1872 is for security authorization, and part of this
is to audit what users are searching. The reference to this was to show you
how your requirement can be accomplished.

To have just the auditing and not the security, you'd need to create your
own SearchComponent and extract out just the auditing bits, or remove the
security bits. This shouldn't be too difficult to do.
Or, you can simply see how it's done, and create your own SearchComponent
and use the same technique. Your SearchComponent should work out a lot
simpler than the one in SOLR-1872.

In the case of audit persistence in SOLR-1872, audit events are written to a
log file, which just happens to be monitored by an external file monitor
which can feed new log entries to other sources (e.g. another index, an
external log repository etc.). It's done this way to keep any external audit
routing/delivery separate from the webapp (it's not part of solr.war's remit
to do audit routing).

For your requirement, you'll probably want to write audited searches
directly into a Solr index, either the same one as is being searched or a
different one (a different one is better, so your public users don't have
access to your search stats). You can use any of the available /update
mechanisms to accomplish this.


Thanks,
Peter


On Fri, Apr 30, 2010 at 11:08 AM, pk pkal...@gmail.com wrote:


 Peter,
 It seems that your solution (SOLR-1872) requires authentication too (and be
 tracked via ur uuid), but my users will be general public using browsers,
 and i can't force any such auth restrictions. Also you didn't mention if
 you
 are already persisting the audit data.. Or i may need to extend it to work
 for my problem..

 My requirement is simple: to know top n query strings with their
 frequencies
 etc..
 Thanks though.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Any-way-to-get-top-n-queries-searched-from-Solr-tp767165p767482.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Problem with pdf, upgrading Cell

2010-04-30 Thread Grant Ingersoll
Can you share the PDF it is failing on?  FWIW, PDFs are notoriously hard to 
extract.  They come in all shapes and flavors and I've seen many a commercial 
extractor fail on them too.  Have you tried using either Tika standalone or 
PDFBox standalone?  Does the file work there?

On Apr 26, 2010, at 8:35 AM, Marc Ghorayeb wrote:

 
 Okay i've been digging a little bit through the Java code from the SVN, and 
 it seems the load function inside the ExtractingDocumentLoader class does not 
 receive the ContentStream (it is set to null...).Maybe i should send this to 
 the developper mailing list?
 Marc
 
 From: dekay...@hotmail.com
 To: solr-user@lucene.apache.org
 Subject: RE: Problem with pdf, upgrading Cell
 Date: Fri, 23 Apr 2010 16:03:28 +0200
 
 
 Seems like i'm not the only one with this no extraction 
 problem:http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609.htmlApparently
  he tried the same thing, building from the trunk, and indexing a pdf, and 
 no extraction occured... Strange.
 Marc G.

 _
 Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, 
 Blackberry, …
 http://www.messengersurvotremobile.com/?d=Hotmail
 
 _
 Découvrez comment SURFER DISCRETEMENT sur un site de rencontres !
 http://clk.atdmt.com/FRM/go/206608211/direct/01/

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



Re: Indexing metadata in solr using ContentStreamUpdateRequest

2010-04-30 Thread Grant Ingersoll
What does your schema look like?

On Apr 30, 2010, at 3:47 AM, Sandhya Agarwal wrote:

 Hello,
 
 I am using ContentStreamUpdateRequest, to index binary documents. At the time 
 of indexing the content, I want to be able to index some additional metadata 
 as well. I believe, this metadata must be provided, prefixed with *literal*. 
 For instance, I have a field named “field1”, defined in schema.xml and to 
 index a document with a value for this field, I would provide 
 “literal.field1” = value.
 
 However, this does not seem to be working and the field defined in 
 schema.xml, *field1*, does not have any data indexed.
 
 How can I get this working ?
 
 Thanks in advance.
 
 Thanks,
 Sandhya

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



Re: ubuntu lucid package

2010-04-30 Thread Olivier Dobberkau

Am 30.04.2010 um 09:24 schrieb Gora Mohanty:

 Also, the standard Debian/Ubuntu way of finding out what files a
 package installed is:
  dpkg -l pkg_name
 
 Regards,
 Gora

You might try:

# dpkg -L solr-common
/.
/etc
/etc/solr
/etc/solr/web.xml
/etc/solr/conf
/etc/solr/conf/admin-extra.html
/etc/solr/conf/elevate.xml
/etc/solr/conf/mapping-ISOLatin1Accent.txt
/etc/solr/conf/protwords.txt
/etc/solr/conf/schema.xml
/etc/solr/conf/scripts.conf
/etc/solr/conf/solrconfig.xml
/etc/solr/conf/spellings.txt
/etc/solr/conf/stopwords.txt
/etc/solr/conf/synonyms.txt
/etc/solr/conf/xslt
/etc/solr/conf/xslt/example.xsl
/etc/solr/conf/xslt/example_atom.xsl
/etc/solr/conf/xslt/example_rss.xsl
/etc/solr/conf/xslt/luke.xsl
/usr
/usr/share
/usr/share/solr
/usr/share/solr/WEB-INF
/usr/share/solr/WEB-INF/lib
/usr/share/solr/WEB-INF/lib/apache-solr-core-1.4.0.jar
/usr/share/solr/WEB-INF/lib/apache-solr-dataimporthandler-1.4.0.jar
/usr/share/solr/WEB-INF/lib/apache-solr-solrj-1.4.0.jar
/usr/share/solr/WEB-INF/weblogic.xml
/usr/share/solr/scripts
/usr/share/solr/scripts/abc
/usr/share/solr/scripts/abo
/usr/share/solr/scripts/backup
/usr/share/solr/scripts/backupcleaner
/usr/share/solr/scripts/commit
/usr/share/solr/scripts/optimize
/usr/share/solr/scripts/readercycle
/usr/share/solr/scripts/rsyncd-disable
/usr/share/solr/scripts/rsyncd-enable
/usr/share/solr/scripts/rsyncd-start
/usr/share/solr/scripts/rsyncd-stop
/usr/share/solr/scripts/scripts-util
/usr/share/solr/scripts/snapcleaner
/usr/share/solr/scripts/snapinstaller
/usr/share/solr/scripts/snappuller
/usr/share/solr/scripts/snappuller-disable
/usr/share/solr/scripts/snappuller-enable
/usr/share/solr/scripts/snapshooter
/usr/share/solr/admin
/usr/share/solr/admin/_info.jsp
/usr/share/solr/admin/action.jsp
/usr/share/solr/admin/analysis.jsp
/usr/share/solr/admin/analysis.xsl
/usr/share/solr/admin/distributiondump.jsp
/usr/share/solr/admin/favicon.ico
/usr/share/solr/admin/form.jsp
/usr/share/solr/admin/get-file.jsp
/usr/share/solr/admin/get-properties.jsp
/usr/share/solr/admin/header.jsp
/usr/share/solr/admin/index.jsp
/usr/share/solr/admin/jquery-1.2.3.min.js
/usr/share/solr/admin/meta.xsl
/usr/share/solr/admin/ping.jsp
/usr/share/solr/admin/ping.xsl
/usr/share/solr/admin/raw-schema.jsp
/usr/share/solr/admin/registry.jsp
/usr/share/solr/admin/registry.xsl
/usr/share/solr/admin/replication
/usr/share/solr/admin/replication/header.jsp
/usr/share/solr/admin/replication/index.jsp
/usr/share/solr/admin/schema.jsp
/usr/share/solr/admin/solr-admin.css
/usr/share/solr/admin/solr_small.png
/usr/share/solr/admin/stats.jsp
/usr/share/solr/admin/stats.xsl
/usr/share/solr/admin/tabular.xsl
/usr/share/solr/admin/threaddump.jsp
/usr/share/solr/admin/threaddump.xsl
/usr/share/solr/admin/debug.jsp
/usr/share/solr/admin/dataimport.jsp
/usr/share/solr/favicon.ico
/usr/share/solr/index.jsp
/usr/share/doc
/usr/share/doc/solr-common
/usr/share/doc/solr-common/changelog.Debian.gz
/usr/share/doc/solr-common/README.Debian
/usr/share/doc/solr-common/TODO.Debian
/usr/share/doc/solr-common/copyright
/usr/share/doc/solr-common/changelog.gz
/usr/share/doc/solr-common/NOTICE.txt.gz
/usr/share/doc/solr-common/README.txt.gz
/var
/var/lib
/var/lib/solr
/var/lib/solr/data
/usr/share/solr/WEB-INF/lib/xml-apis.jar
/usr/share/solr/WEB-INF/lib/xml-apis-ext.jar
/usr/share/solr/WEB-INF/lib/slf4j-jdk14.jar
/usr/share/solr/WEB-INF/lib/slf4j-api.jar
/usr/share/solr/WEB-INF/lib/lucene-spellchecker.jar
/usr/share/solr/WEB-INF/lib/lucene-snowball.jar
/usr/share/solr/WEB-INF/lib/lucene-queries.jar
/usr/share/solr/WEB-INF/lib/lucene-highlighter.jar
/usr/share/solr/WEB-INF/lib/lucene-core.jar
/usr/share/solr/WEB-INF/lib/lucene-analyzers.jar
/usr/share/solr/WEB-INF/lib/jetty-util.jar
/usr/share/solr/WEB-INF/lib/jetty.jar
/usr/share/solr/WEB-INF/lib/commons-io.jar
/usr/share/solr/WEB-INF/lib/commons-httpclient.jar
/usr/share/solr/WEB-INF/lib/commons-fileupload.jar
/usr/share/solr/WEB-INF/lib/commons-csv.jar
/usr/share/solr/WEB-INF/lib/commons-codec.jar
/usr/share/solr/WEB-INF/web.xml
/usr/share/solr/conf

If i reckon correctly some parts of apache solr will not work with the ubuntu 
lucid distribution.

http://solr.dkd.local/update/extract
 throws an error:

The server encountered an internal error (lazy loading error
org.apache.solr.common.SolrException: lazy loading error at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
at

Maybe someone from ubuntu reading this list can confirm this.

Olivier
--

Olivier Dobberkau

d.k.d Internet Service GmbH
Kaiserstraße 73
60329 Frankfurt/Main

mail: olivier.dobber...@dkd.de
web: http://www.dkd.de


RE: Indexing metadata in solr using ContentStreamUpdateRequest

2010-04-30 Thread Sandhya Agarwal
Thanks, Grant.

I resolved this issue by doing the following :

For each of my own metadata fields, it is also required to define the mapping 
between tika field and solr field, either in solrconfig.xml or while submitting 
the request for indexing. Also, got to make sure that lowernames = false, in 
case the field names defined in schema.xml are in mixed case or upper case. And 
this solved the issue for me.

Thanks,
Sandhya

-Original Message-
From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant Ingersoll
Sent: Friday, April 30, 2010 4:15 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing metadata in solr using ContentStreamUpdateRequest

What does your schema look like?

On Apr 30, 2010, at 3:47 AM, Sandhya Agarwal wrote:

 Hello,
 
 I am using ContentStreamUpdateRequest, to index binary documents. At the time 
 of indexing the content, I want to be able to index some additional metadata 
 as well. I believe, this metadata must be provided, prefixed with *literal*. 
 For instance, I have a field named “field1”, defined in schema.xml and to 
 index a document with a value for this field, I would provide 
 “literal.field1” = value.
 
 However, this does not seem to be working and the field defined in 
 schema.xml, *field1*, does not have any data indexed.
 
 How can I get this working ?
 
 Thanks in advance.
 
 Thanks,
 Sandhya

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



Re: Problem with pdf, upgrading Cell

2010-04-30 Thread Praveen Agrawal
I did try standalone version of tika0.7, and it extracted pdf content
successfully. Then i replaced tika related jars in contrib/extraction/lib of
solr1.4 dist'n with their newer versions, and now it doesn;t extract
contents from ANY pdf.
Earlier (0.4) it was throwing exception for few pdfs, but now no contents or
exception.


On Fri, Apr 30, 2010 at 4:14 PM, Grant Ingersoll gsing...@apache.orgwrote:

 Can you share the PDF it is failing on?  FWIW, PDFs are notoriously hard to
 extract.  They come in all shapes and flavors and I've seen many a
 commercial extractor fail on them too.  Have you tried using either Tika
 standalone or PDFBox standalone?  Does the file work there?

 On Apr 26, 2010, at 8:35 AM, Marc Ghorayeb wrote:

 
  Okay i've been digging a little bit through the Java code from the SVN,
 and it seems the load function inside the ExtractingDocumentLoader class
 does not receive the ContentStream (it is set to null...).Maybe i should
 send this to the developper mailing list?
  Marc
 
  From: dekay...@hotmail.com
  To: solr-user@lucene.apache.org
  Subject: RE: Problem with pdf, upgrading Cell
  Date: Fri, 23 Apr 2010 16:03:28 +0200
 
 
  Seems like i'm not the only one with this no extraction problem:
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609.htmlApparentlyhe
  tried the same thing, building from the trunk, and indexing a pdf, and no
 extraction occured... Strange.
  Marc G.
 
  _
  Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone,
 Blackberry, …
  http://www.messengersurvotremobile.com/?d=Hotmail
 
  _
  Découvrez comment SURFER DISCRETEMENT sur un site de rencontres !
  http://clk.atdmt.com/FRM/go/206608211/direct/01/

 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem using Solr/Lucene:
 http://www.lucidimagination.com/search




Solr date representation

2010-04-30 Thread Toby White
Don't know if this counts as a bug report or not - it's certainly a  
corner case, but it's just bitten me.


http://wiki.apache.org/solr/IndexingDates suggests that the canonical  
form of a date is a string like: 1995-12-31T23:59:59Z


and says that this is a restricted form of the canonical  
representation of dateTime from XML Schema.


The latter explicitly says '0001' is the lexical representation of  
the year 1 of the Common Era (1 CE, sometimes written AD 1 or 1 AD)


However, if I put a document into Solr (1.4 release) with a datetime  
field of


0001-01-01T00:00:00Z

then on retrieving that document, I get back the value

1-01-01T00:00:00Z

(ie no preceding zeroes) - which tripped up my date-parsing routines.  
Preceding zeroes seem to be universally dropped - all dates before  
1000AD seem to have the equivalent problem.


Is this a bug in the code, or a bug in the documentation?

Toby

--
http://timetric.com
2nd Floor, White Bear Yard, 144a Clerkenwell Road, London EC1R 5DF
twitter: @timetric, @tow21 | skype: tobyohwhite



Re: Problem with pdf, upgrading Cell

2010-04-30 Thread Marc Ghorayeb

Hi
Nope i didn't get it to work... Just like you, command line version of tika 
extracts correctly the content, but once included in Solr, no content is 
extracted.
What i tried until now is:- Updating the tika libraries inside Solr 1.4 public 
version, no luck there.- Downloading the latest SVN version, compiled it, and 
started from a simple schema, still no luck.- Getting other versions compiled 
on hudson (nightly builds), and testing them also, still no extraction.
I sent a mail on the developpers mailing list but they told me i should just 
mail here, hope some developper reads this because it's quite an important 
feature of Solr and somehow it got broke between the 1.4 release, and the last 
version on the svn.
Marc  
_
Consultez gratuitement vos emails Orange, Gmail, Free, ... directement dans 
HOTMAIL !
http://www.windowslive.fr/hotmail/agregation/

Re: Elevation of of part match

2010-04-30 Thread MitchK

Gert, could you provide the solrconfig- and schema-specifications you have
made?
If the wiki really means what it says, the behaviour you want should be
possible. 

But that's only what I guess.

Btw: The standard definition for the elevation-component is string in the
example-directory. That means that there is no tokinization and according to
this a partially match is not possible.

Hope that helps
- Mitch 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Elevation-of-of-part-match-tp767139p767877.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ubuntu lucid package

2010-04-30 Thread pablo platt
what parts doesn't work for you?
If there are bugs in the package it will be great if you can report them to
make it better.

On Fri, Apr 30, 2010 at 1:50 PM, Olivier Dobberkau olivier.dobber...@dkd.de
 wrote:


 Am 30.04.2010 um 09:24 schrieb Gora Mohanty:

  Also, the standard Debian/Ubuntu way of finding out what files a
  package installed is:
   dpkg -l pkg_name
 
  Regards,
  Gora

 You might try:

 # dpkg -L solr-common
 /.
 /etc
 /etc/solr
 /etc/solr/web.xml
 /etc/solr/conf
 /etc/solr/conf/admin-extra.html
 /etc/solr/conf/elevate.xml
 /etc/solr/conf/mapping-ISOLatin1Accent.txt
 /etc/solr/conf/protwords.txt
 /etc/solr/conf/schema.xml
 /etc/solr/conf/scripts.conf
 /etc/solr/conf/solrconfig.xml
 /etc/solr/conf/spellings.txt
 /etc/solr/conf/stopwords.txt
 /etc/solr/conf/synonyms.txt
 /etc/solr/conf/xslt
 /etc/solr/conf/xslt/example.xsl
 /etc/solr/conf/xslt/example_atom.xsl
 /etc/solr/conf/xslt/example_rss.xsl
 /etc/solr/conf/xslt/luke.xsl
 /usr
 /usr/share
 /usr/share/solr
 /usr/share/solr/WEB-INF
 /usr/share/solr/WEB-INF/lib
 /usr/share/solr/WEB-INF/lib/apache-solr-core-1.4.0.jar
 /usr/share/solr/WEB-INF/lib/apache-solr-dataimporthandler-1.4.0.jar
 /usr/share/solr/WEB-INF/lib/apache-solr-solrj-1.4.0.jar
 /usr/share/solr/WEB-INF/weblogic.xml
 /usr/share/solr/scripts
 /usr/share/solr/scripts/abc
 /usr/share/solr/scripts/abo
 /usr/share/solr/scripts/backup
 /usr/share/solr/scripts/backupcleaner
 /usr/share/solr/scripts/commit
 /usr/share/solr/scripts/optimize
 /usr/share/solr/scripts/readercycle
 /usr/share/solr/scripts/rsyncd-disable
 /usr/share/solr/scripts/rsyncd-enable
 /usr/share/solr/scripts/rsyncd-start
 /usr/share/solr/scripts/rsyncd-stop
 /usr/share/solr/scripts/scripts-util
 /usr/share/solr/scripts/snapcleaner
 /usr/share/solr/scripts/snapinstaller
 /usr/share/solr/scripts/snappuller
 /usr/share/solr/scripts/snappuller-disable
 /usr/share/solr/scripts/snappuller-enable
 /usr/share/solr/scripts/snapshooter
 /usr/share/solr/admin
 /usr/share/solr/admin/_info.jsp
 /usr/share/solr/admin/action.jsp
 /usr/share/solr/admin/analysis.jsp
 /usr/share/solr/admin/analysis.xsl
 /usr/share/solr/admin/distributiondump.jsp
 /usr/share/solr/admin/favicon.ico
 /usr/share/solr/admin/form.jsp
 /usr/share/solr/admin/get-file.jsp
 /usr/share/solr/admin/get-properties.jsp
 /usr/share/solr/admin/header.jsp
 /usr/share/solr/admin/index.jsp
 /usr/share/solr/admin/jquery-1.2.3.min.js
 /usr/share/solr/admin/meta.xsl
 /usr/share/solr/admin/ping.jsp
 /usr/share/solr/admin/ping.xsl
 /usr/share/solr/admin/raw-schema.jsp
 /usr/share/solr/admin/registry.jsp
 /usr/share/solr/admin/registry.xsl
 /usr/share/solr/admin/replication
 /usr/share/solr/admin/replication/header.jsp
 /usr/share/solr/admin/replication/index.jsp
 /usr/share/solr/admin/schema.jsp
 /usr/share/solr/admin/solr-admin.css
 /usr/share/solr/admin/solr_small.png
 /usr/share/solr/admin/stats.jsp
 /usr/share/solr/admin/stats.xsl
 /usr/share/solr/admin/tabular.xsl
 /usr/share/solr/admin/threaddump.jsp
 /usr/share/solr/admin/threaddump.xsl
 /usr/share/solr/admin/debug.jsp
 /usr/share/solr/admin/dataimport.jsp
 /usr/share/solr/favicon.ico
 /usr/share/solr/index.jsp
 /usr/share/doc
 /usr/share/doc/solr-common
 /usr/share/doc/solr-common/changelog.Debian.gz
 /usr/share/doc/solr-common/README.Debian
 /usr/share/doc/solr-common/TODO.Debian
 /usr/share/doc/solr-common/copyright
 /usr/share/doc/solr-common/changelog.gz
 /usr/share/doc/solr-common/NOTICE.txt.gz
 /usr/share/doc/solr-common/README.txt.gz
 /var
 /var/lib
 /var/lib/solr
 /var/lib/solr/data
 /usr/share/solr/WEB-INF/lib/xml-apis.jar
 /usr/share/solr/WEB-INF/lib/xml-apis-ext.jar
 /usr/share/solr/WEB-INF/lib/slf4j-jdk14.jar
 /usr/share/solr/WEB-INF/lib/slf4j-api.jar
 /usr/share/solr/WEB-INF/lib/lucene-spellchecker.jar
 /usr/share/solr/WEB-INF/lib/lucene-snowball.jar
 /usr/share/solr/WEB-INF/lib/lucene-queries.jar
 /usr/share/solr/WEB-INF/lib/lucene-highlighter.jar
 /usr/share/solr/WEB-INF/lib/lucene-core.jar
 /usr/share/solr/WEB-INF/lib/lucene-analyzers.jar
 /usr/share/solr/WEB-INF/lib/jetty-util.jar
 /usr/share/solr/WEB-INF/lib/jetty.jar
 /usr/share/solr/WEB-INF/lib/commons-io.jar
 /usr/share/solr/WEB-INF/lib/commons-httpclient.jar
 /usr/share/solr/WEB-INF/lib/commons-fileupload.jar
 /usr/share/solr/WEB-INF/lib/commons-csv.jar
 /usr/share/solr/WEB-INF/lib/commons-codec.jar
 /usr/share/solr/WEB-INF/web.xml
 /usr/share/solr/conf

 If i reckon correctly some parts of apache solr will not work with the
 ubuntu lucid distribution.

 http://solr.dkd.local/update/extract
  throws an error:

 The server encountered an internal error (lazy loading error
 org.apache.solr.common.SolrException: lazy loading error at

 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
 at

 Maybe someone from ubuntu reading this list can confirm this.

 Olivier
 --

 Olivier Dobberkau

 d.k.d Internet Service GmbH
 Kaiserstraße 73
 60329 Frankfurt/Main

 mail: olivier.dobber...@dkd.de
 web: 

RE: benefits of float vs. string

2010-04-30 Thread Nagelberg, Kallin
When using numerical types you can do ranges like 3  myfield = 10 , as well 
as a lot of other interesting mathematical functions that would not be possible 
with a string type.

Thanks for the info Yonik,
-Kallin Nagelberg

-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net] 
Sent: Friday, April 30, 2010 1:27 AM
To: solr-user@lucene.apache.org; yo...@lucidimagination.com
Subject: Re: benefits of float vs. string

Please explain a range query? 

tia :-)

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Thu, 4/29/10, Yonik Seeley yo...@lucidimagination.com wrote:

 From: Yonik Seeley yo...@lucidimagination.com
 Subject: Re: benefits of float vs. string
 To: solr-user@lucene.apache.org
 Date: Thursday, April 29, 2010, 1:01 PM
 On Wed, Apr 28, 2010 at 11:22 AM,
 Nagelberg, Kallin
 knagelb...@globeandmail.com
 wrote:
  Does anyone have an idea about the performance
 benefits of searching across floats compared to strings? I
 have one multi-valued field that contains about 3000
 distinct IDs across 5 million documents. I am going to be a
 lot of queries like q=id:102 OR id:303 OR id:305, etc. Right
 now it is a String but I am going to switch to a float as
 intuitively it ought to be easier to filter a number than a
 string.
 
 
 There won't be any difference in search speed for term
 queries as you
 show above.
 If you don't need to do sorting or range queries on that
 field, I'd
 leave it as a String.
 
 
 -Yonik
 Apache Lucene Eurocon 2010
 18-21 May 2010 | Prague
 


prefixing with dismax

2010-04-30 Thread Nagelberg, Kallin
Hey,

I've been using the dismax query parser so that I can pass a user created 
search string directly to Solr. Now I'm getting the requirement that something 
like 'Bo' must match 'Bob', or 'Bob Jo' must match 'Bob Jones'. I can't think 
of a way to make this happen with Dismax, though it's pretty simple with 
standard syntax. I guess I would just split on space and created ANDed terms 
like 'myfield:token*' . This doesn't feel like a great approach though, since 
I'm losing all of the escaping magic of Dismax. Does anyone have any cleaner 
solutions to this sort of problem? I imagine it's quite common.

Thanks,
Kallin Nagelberg


RE: Elevation of of part match

2010-04-30 Thread MitchK

The elevate.xml-example says:

!-- If this file is found in the config directory, it will only be
 loaded once at startup.  If it is found in Solr's data
 directory, it will be re-loaded every commit.
--

Did you make a restart?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Elevation-of-of-part-match-tp767139p768120.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem with pdf, upgrading Cell

2010-04-30 Thread Grant Ingersoll
Praveen and Marc,

Can you share the PDF (feel free to email my private email) that fails in Solr?

Thanks,
Grant


On Apr 30, 2010, at 7:55 AM, Marc Ghorayeb wrote:

 
 Hi
 Nope i didn't get it to work... Just like you, command line version of tika 
 extracts correctly the content, but once included in Solr, no content is 
 extracted.
 What i tried until now is:- Updating the tika libraries inside Solr 1.4 
 public version, no luck there.- Downloading the latest SVN version, compiled 
 it, and started from a simple schema, still no luck.- Getting other versions 
 compiled on hudson (nightly builds), and testing them also, still no 
 extraction.
 I sent a mail on the developpers mailing list but they told me i should just 
 mail here, hope some developper reads this because it's quite an important 
 feature of Solr and somehow it got broke between the 1.4 release, and the 
 last version on the svn.
 Marc
 _
 Consultez gratuitement vos emails Orange, Gmail, Free, ... directement dans 
 HOTMAIL !
 http://www.windowslive.fr/hotmail/agregation/

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



How is DeletionPolicy supposed to work?

2010-04-30 Thread Paleo Tek

Hi folks,

In moving to 1.4, it was unclear to me how deletionPolicy was supposed 
to work.  I commit/optimize on a build server, then replicate to 
multiple search servers.  I don't need anything fancy for a deletion 
policy:  save one copy, and replicate on copy.   But when I used no 
policy, sometimes the index would be twice the normal size.  In an 
effort to eliminate that, I put in the explicit deletion below.  But it 
STILL sometimes creates an index of double the size.  This is causing 
space problems on some on my replicated servers.


Can someone please explain what configuration I should apply to not ever 
save any extra commits or optimized commits, so that my index and all 
replicated copies of it will have a size of 1 index, rather than 2 
indexes?  A summary of the theory behind that would be most welcome 
too.  Thanks!


   -Jim


The deletion policy stanza from mainIndex in solrconfig.xml:


   deletionPolicy class=solr.SolrDeletionPolicy
 !-- The number of commit points to be kept --
 str name=maxCommitsToKeep0/str
 !-- The number of optimized commit points to be kept --
 str name=maxOptimizedCommitsToKeep1/str
 !--
 Delete all commit points once they have reached the given age.
 Supports DateMathParser syntax e.g.

 str name=maxCommitAge30MINUTES/str

 str name=maxCommitAge1DAY/str
 --
   /deletionPolicy





Re: Trouble with parenthesis

2010-04-30 Thread Yonik Seeley
Pure negatives in lucene syntax don't match anything (solr currently
only fixes this for you if it's a pure negative at the top-level, not
embeded).

Try changing
(NOT periodicite:annuel)
to
(*:* NOT periodicite:annuel)

But the second version below where you just removed the parens will be
more efficient.

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague



On Fri, Apr 30, 2010 at 1:49 AM, mailing-list gboyr...@andevsol.com wrote:
 Hi everybody,

 We got a problem with parenthesis in a lucene/solr request (Solr 1.4) :
 - {!lucene q.op=AND}( ville:Moscou -periodicite:annuel) give
 254documents
  with parsedquery+ville:Moscou -periodicite:annuel in debug mode. Thas'ts
 correct.
 - {!lucene q.op=AND} (ville:Moscou AND NOT periodicite:annuel) same
 results.
 - {!lucene q.op=AND} (ville:Moscou AND (NOT periodicite:annuel)) give 0
 documents
  with parsedquery+ville:Moscou +(-periodicite:annuel)

 The 2 fields are standards string fields in the solr shema.

 Is it a issue or standard way of the Solr Query Parser ?

 Best regards.
 Gilbert Boyreau



RE: Elevation of of part match

2010-04-30 Thread Villemos, Gert
Yes, I restarted. To make sure I just did it again. Same result; archive 
elevates, packet archive doesnt.
 
G.
 



From: MitchK [mailto:mitc...@web.de]
Sent: Fri 4/30/2010 5:02 PM
To: solr-user@lucene.apache.org
Subject: RE: Elevation of of part match




The elevate.xml-example says:

!-- If this file is found in the config directory, it will only be
 loaded once at startup.  If it is found in Solr's data
 directory, it will be re-loaded every commit.
--

Did you make a restart?
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Elevation-of-of-part-match-tp767139p768120.html
Sent from the Solr - User mailing list archive at Nabble.com.





Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.



Re: How is DeletionPolicy supposed to work?

2010-04-30 Thread Yonik Seeley
Simply use what the default was in the example solrconfig.xml... there
is no need to modify that unless you are doing something advanced.  In
the config below, you show maxOptimizedCommitsToKeep=1, which will
increase index size by always keeping around one optimized commit
point.

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague

On Fri, Apr 30, 2010 at 11:40 AM, Paleo Tek paleo...@gmail.com wrote:
 Hi folks,

 In moving to 1.4, it was unclear to me how deletionPolicy was supposed to
 work.  I commit/optimize on a build server, then replicate to multiple
 search servers.  I don't need anything fancy for a deletion policy:  save
 one copy, and replicate on copy.   But when I used no policy, sometimes the
 index would be twice the normal size.  In an effort to eliminate that, I put
 in the explicit deletion below.  But it STILL sometimes creates an index of
 double the size.  This is causing space problems on some on my replicated
 servers.

 Can someone please explain what configuration I should apply to not ever
 save any extra commits or optimized commits, so that my index and all
 replicated copies of it will have a size of 1 index, rather than 2 indexes?
  A summary of the theory behind that would be most welcome too.  Thanks!

           -Jim


 The deletion policy stanza from mainIndex in solrconfig.xml:


   deletionPolicy class=solr.SolrDeletionPolicy
     !-- The number of commit points to be kept --
     str name=maxCommitsToKeep0/str
     !-- The number of optimized commit points to be kept --
     str name=maxOptimizedCommitsToKeep1/str
     !--
         Delete all commit points once they have reached the given age.
         Supports DateMathParser syntax e.g.
                 str name=maxCommitAge30MINUTES/str
         str name=maxCommitAge1DAY/str
     --
   /deletionPolicy






Re: Problem with pdf, upgrading Cell

2010-04-30 Thread Praveen Agrawal
Grant,
You can try any of the sample pdfs that come in /docs folder of Solr 1.4
dist'n. I had tried 'Installing Solr in Tomcat.pdf', 'index.pdf' etc. Only
metadata i.e. stream_size, content_type apart from my own literals are
indexed, and content is missing..


On Fri, Apr 30, 2010 at 8:52 PM, Grant Ingersoll gsing...@apache.orgwrote:

 Praveen and Marc,

 Can you share the PDF (feel free to email my private email) that fails in
 Solr?

 Thanks,
 Grant


 On Apr 30, 2010, at 7:55 AM, Marc Ghorayeb wrote:

 
  Hi
  Nope i didn't get it to work... Just like you, command line version of
 tika extracts correctly the content, but once included in Solr, no content
 is extracted.
  What i tried until now is:- Updating the tika libraries inside Solr 1.4
 public version, no luck there.- Downloading the latest SVN version, compiled
 it, and started from a simple schema, still no luck.- Getting other versions
 compiled on hudson (nightly builds), and testing them also, still no
 extraction.
  I sent a mail on the developpers mailing list but they told me i should
 just mail here, hope some developper reads this because it's quite an
 important feature of Solr and somehow it got broke between the 1.4 release,
 and the last version on the svn.
  Marc
  _
  Consultez gratuitement vos emails Orange, Gmail, Free, ... directement
 dans HOTMAIL !
  http://www.windowslive.fr/hotmail/agregation/

 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem using Solr/Lucene:
 http://www.lucidimagination.com/search




Re: StreamingUpdateSolrServer hangs

2010-04-30 Thread Yonik Seeley
On Thu, Apr 29, 2010 at 7:51 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 I'm trying to reproduce now... single thread adding documents to a
 multithreaded client, StreamingUpdateSolrServer(addr,32,4)

 I'm currently at the 2.5 hour mark and 100M documents - no issues so far.

I let it go to 500M docs... everything works fine (this is with the
current trunk).

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague


Re: thresholding results by percentage drop from maxScore in lucene/solr

2010-04-30 Thread MitchK

Mike,

why don't order by the number of found items in your facet? If you get too
many facets, just throw those away that got the smallest value, if you got
not enough place for them.
I suggest that, because you don't know every search-case.

Sometimes the user does not really know what he is searching for or how to
make his search more special and faceting helps him to navigate over a
search-result. 

Just some thoughts. :-)

- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/thresholding-results-by-percentage-drop-from-maxScore-in-lucene-solr-tp768872p768891.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Elevation of of part match

2010-04-30 Thread MitchK

Sorry, as far as I did not make any experiences with the elevatorComponent, I
can't help you with this. Even searching in the mailing list offers no
usefull information. . . 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Elevation-of-of-part-match-tp767139p768895.html
Sent from the Solr - User mailing list archive at Nabble.com.


Custom SolrQueryRequest/SolrQueryResponse

2010-04-30 Thread Aaron Hiniker
Solr team,

Long time, first time-- many thanks for all your work on creating this 
excellent search appliance.

The 40,000ft view of my problem is that I need to execute multiple queries per 
endpoint invocation, with the results for each query grouped in the response 
output as such that they were individual calls (think “composite” request  
response) wrapped by a composite tag, etc.

So:

Normal Query (single query input):

results

/results

Composite Query (multiple query input):

composite
   results/results
   results/results
  results/results
composite

I’ve already created a custom Handler and Writer for our “single”, 
non-composite needs, but now I need to modify the behavior so that if multiple 
search queries are specified (ie:  q=query1;query2;query3 etc), the service 
will invoke and return all 3 result sets in a single invocation.

Herein lies the problem from what I can tell:  I don’t have any control over 
SolrQueryRequest or SolrQueryResponse.  My initial attempts have me subclassing 
both of these to hold a List of requests and responses, with a cursor that 
moves the “current” req/res each time through my handler.  All methods are 
implemented to delegate directly to the req/res that the cursor is pointing to. 
 I would check, via instanceof, whether we are dealing with a normal or 
composite query in the writer to dump the results appropriately.

To pull this off, it appears I would need to modify SolrDispatchFilter to allow 
for a configurable factory(?) for my custom SolrQueryRequest and 
SolrQueryResponse objects.  Can this be solved some other way without code 
modifications?  If code modifications are required, do you have any suggestions 
on how the configuration file entry might look, etc?  I can write the patch but 
wanted to get your feedback before going any further with this.

Thanks

Aaron


Re: Solr date representation

2010-04-30 Thread Chris Hostetter

: then on retrieving that document, I get back the value
: 
: 1-01-01T00:00:00Z
: 
: (ie no preceding zeroes) - which tripped up my date-parsing routines.
: Preceding zeroes seem to be universally dropped - all dates before 1000AD seem
: to have the equivalent problem.
: 
: Is this a bug in the code, or a bug in the documentation?

It's a bug in the code, thans for pointing this out...

https://issues.apache.org/jira/browse/SOLR-1899



-Hoss