Re: Is semicolon a character that needs escaping?

2010-09-08 Thread Michael Lackhoff
On 08.09.2010 00:05 Chris Hostetter wrote:

 
 : Subject: Is semicolon a character that needs escaping?
   ...
 : From this I conclude that there is a bug either in the docs or in the
 : query parser or I missed something. What is wrong here?
 
 Back in Solr 1.1, the standard query parser treated ; as a special 
 character and looked for sort instructions after it.  
 
 Starting in Solr 1.2 (released in 2007) a sort param was added, and 
 semicolon was only considered a special character if you did not 
 explicilty mention a sort param (for back compatibility)
 
 Starting with Solr 1.4, the default was changed so that semicolon wasn't 
 considered a meta-character even if you didn't have a sort param -- you 
 have to explicilty select the lucenePlusSort QParser to get this 
 behavior.
 
 I can only assume that if you are seeing this behavior, you are either 
 using a very old version of Solr, or you have explicitly selected the 
 lucenePlusSort parser somewhere in your params/config.
 
 This was heavily documented in CHANGES.txt for Solr 1.4 (you can find 
 mention of it when searching for either ; or semicolon)

I am using 1.3 without a sort param which explains it, I think. It would
be nice to update to 1.4 but we try to avoid such actions on a
production server as long as everything runs fine (the semicolon thing
was only reported recently).

Many thanks for your detailed explanation!
-Michael


Re: Distance sorting with spatial filtering

2010-09-08 Thread Scott K
I get the error on all functions.
GET 'http://localhost:8983/solr/select?q=*:*sort=sum(1)+asc'
Error 400 can not sort on unindexed field: sum(1)

I tried another nightly build from today, Sep 7th, with the same
results. I attached the schema.xml

Thanks for the help!
Scott

On Wed, Sep 1, 2010 at 18:43, Lance Norskog goks...@gmail.com wrote:
 Post your schema.

 On Mon, Aug 30, 2010 at 2:04 PM, Scott K s...@skister.com wrote:
 The new spatial filtering (SOLR-1586) works great and is much faster
 than fq={!frange. However, I am having problems sorting by distance.
 If I try
 GET 
 'http://localhost:8983/solr/select/?q=*:*sort=dist(2,latitude,longitude,0,0)+asc'
 I get an error:
 Error 400 can not sort on unindexed field: dist(2,latitude,longitude,0,0)

 I was able to work around this with
 GET 'http://localhost:8983/solr/select/?q=*:* AND _val_:recip(dist(2,
 latitude, longitude, 0,0),1,1,1)fl=*,score'

 But why isn't sorting by functions working? I get this error with any
 function I try to sort on.This is a nightly trunk build from Aug 25th.
 I see SOLR-1297 was reopened, but that seems to be for edge cases.

 Second question: I am using the LatLonType from the Spatial Filtering
 wiki, http://wiki.apache.org/solr/SpatialSearch
 Are there any distance sorting functions that use this field, or do I
 need to have three indexed fields, store_lat_lon, latitude, and
 longitude, if I want both filtering and sorting by distance.

 Thanks, Scott




 --
 Lance Norskog
 goks...@gmail.com

?xml version=1.0 encoding=UTF-8 ?
!--  
 PERFORMANCE NOTE: this schema includes many optional features and should not
 be used for benchmarking.  To improve performance one could
  - set stored=false for all fields possible (esp large fields) when you
only need to search on the field but don't need to return the original
value.
  - set indexed=false if you don't need to search on the field, but only
return the field as a result of searching on other indexed fields.
  - remove all unneeded copyField statements
  - for best index size and searching performance, set index to false
for all general text fields, use copyField to copy them to the
catchall text field, and use that for searching.
  - For maximum indexing performance, use the StreamingUpdateSolrServer
java client.
  - Remember to run the JVM in server mode, and use a higher logging level
that avoids logging every request
--

schema name=schema version=1.2
  types
fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/

!-- boolean type: true or false --
fieldType name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true/
!--Binary data type. The data should be sent/retrieved in as Base64 encoded Strings --
fieldtype name=binary class=solr.BinaryField/
!--
  Default numeric field types. For faster range queries, consider the tint/tfloat/tlong/tdouble types.
--
fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/
fieldType name=float class=solr.TrieFloatField precisionStep=0 omitNorms=true positionIncrementGap=0/
fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/
fieldType name=double class=solr.TrieDoubleField precisionStep=0 omitNorms=true positionIncrementGap=0/

fieldType name=tint class=solr.TrieIntField precisionStep=8 omitNorms=true positionIncrementGap=0/
fieldType name=tfloat class=solr.TrieFloatField precisionStep=8 omitNorms=true positionIncrementGap=0/
fieldType name=tlong class=solr.TrieLongField precisionStep=8 omitNorms=true positionIncrementGap=0/
fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8 omitNorms=true positionIncrementGap=0/

fieldType name=date class=solr.TrieDateField omitNorms=true precisionStep=0 positionIncrementGap=0/
!-- A Trie based date field for faster date range queries and date faceting. --
fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/
fieldType name=random class=solr.RandomSortField indexed=true /

fieldType name=location class=solr.LatLonType subFieldType=double /


!-- A text field that only splits on whitespace for exact matching of words --
fieldType name=text_ws class=solr.TextField positionIncrementGap=100
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
/fieldType

fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter 

Re: Alphanumeric wildcard search problem

2010-09-08 Thread Hasnain

The real problem was this tag

!-- field for the QueryParser to use when an explicit fieldname is absent
--
defaultSearchFieldtext/defaultSearchField

and I was quering like this q=r-1* instead of q=mat_nr:r-1*
so whatever fieldType I use for mat_nr, it was using text fieldType which
had WordDelimiterFilterFactory, hence I had to put space inorder to get it
running.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Alphanumeric-wildcard-search-problem-tp1393332p1437584.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImportHandlerException for custom DIH Transformer

2010-09-08 Thread Shashikant Kore
Resurrecting an old thread.

I faced exact problem as Tommy and the jar was in {solr.home}/lib as Noble
had suggested.

My custom transformer overrides following method as per the specification of
Transformer class.

public Object transformRow(MapString, Object row, Context
context);

But, in the code (EntityProcessorWrapper.java), I see the following line.

  final Method meth = clazz.getMethod(TRANSFORM_ROW, Map.class);

This doesn't match the method signature in Transformer. I think this should
be

  final Method meth = clazz.getMethod(TRANSFORM_ROW, Map.class,
Context.class);

I have verified that adding a method transformRow(MapString, Object row)
works.

Am I missing something?

--shashi

2010/2/8 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 On Mon, Feb 8, 2010 at 9:13 AM, Tommy Chheng tommy.chh...@gmail.com
 wrote:
   I'm having trouble making a custom DIH transformer in solr 1.4.
 
  I compiled the General TrimTransformer into a jar. (just copy/paste
 sample
  code from http://wiki.apache.org/solr/DIHCustomTransformer)
  I placed the jar along with the dataimporthandler jar in solr/lib (same
  directory as the jetty jar)

 do not keep in solr/lib it wont work. keep it in {solr.home}/lib
 
  Then I added to my DIH data-config.xml file:
  transformer=DateFormatTransformer, RegexTransformer,
  com.chheng.dih.transformers.TrimTransformer
 
  Now I get this exception when I try running the import.
  org.apache.solr.handler.dataimport.DataImportHandlerException:
  java.lang.NoSuchMethodException:
  com.chheng.dih.transformers.TrimTransformer.transformRow(java.util.Map)
 at
 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.loadTransformers(EntityProcessorWrapper.java:120)
 
  I noticed the exception lists TrimTransformer.transformRow(java.util.Map)
  but the abstract Transformer class defines a two parameter method:
  transformRow(MapString, Object row, Context context)?
 
 
  --
  Tommy Chheng
  Programmer and UC Irvine Graduate Student
  Twitter @tommychheng
  http://tommy.chheng.com
 
 

 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com



Re: stream.url

2010-09-08 Thread satya swaroop
Hi Hoss,

 Thanks for reply and it got working The reason was as you
said i was not double escaping i used %2520 for whitespace and it is
working now

Thanks,
satya


Re: Query result ranking - Score independent

2010-09-08 Thread Alessandro Benedetti
My request was very simple:
q= astronomy^0
And Solr returned the exception.
Maybe the zero boost factor is not causing the exception?

1) We indexed n documents with a Schema.xml.
2)Then we changed some field type in the Schema.xml
3)Then we indexed other m documents

Maybe this could cause the exception?



2010/9/7 Grant Ingersoll gsing...@apache.org


 On Sep 7, 2010, at 7:08 AM, Alessandro Benedetti wrote:

  Hi all,
  I need to retrieve query-results with a ranking independent from each
  query-result's default lucene score, which means assigning the same score
 to
  each query result.
  I tried to use a zero boost factor ( ^0 ) to reset to zero each
  query-result's score.
  This strategy seems to work within the example solr instance, but in my
  Solr instance, using a zero boost factor causes a Buffer Exception
  (
  HTTP Status 500 - null java.lang.IllegalArgumentException at
  java.nio.Buffer.limit(Buffer.java:249) at
 
 org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:123)
  at
 
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157)
  at
 
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
  at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:70) at
  org.apache.lucene.store.IndexInput.readLong(IndexInput.java:93) at
  org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:210) at
  org.apache.lucene.index.SegmentReader.document(SegmentReader.java:948) at
 
 org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:506)
  at org.apache.lucene.index.IndexReader.document(IndexReader.java:947)
  )

 Hmm, that stack trace doesn't align w/ the boost factor.  What  was your
 request?  I think there might be something else wrong here.

  Do you know any other technique to reset to some fixed constant value,
 all
  the query-result's scores?
  Each query result should obtain the same score.
  Any suggestion?


 The ConstantScoreQuery or a Filter should do this.  You could do something
 like:

 q=*:*fq=the real query, as in q=*:*fq=field:foo

 -Grant


 --
 Grant Ingersoll
 http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8




-- 
--

Benedetti Alessandro
Personal Page: http://tigerbolt.altervista.org

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Phrase search + multi-word index time expanded synonym

2010-09-08 Thread Xavier Schepler

Hello,

well, first, here's the field type that is searched :

fieldtype name=SyFR class=solr.TextField
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
!-- Synonyms --
filter class=solr.SynonymFilterFactory synonyms=synonyms-fr.txt 
ignoreCase=true expand=true/

filter class=solr.LowerCaseFilterFactory/
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/

/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/

/analyzer
/fieldtype

here's the synonym from the synonyms-fr.txt file :

...
PS,Parti socialiste
...

and here's the query :

PS et.

It returns no result, whereas Parti socialiste et returns the results.

How can I have both queries working ? I'm thinking about different 
configurations but I didn't found any solution at the moment.

Thx for reading,

Xavier Schepler


Creating a sub-index from another

2010-09-08 Thread Santiago Pérez

Hej,

I have a Solr Index with several million documents. I need to implement some
text mining processes and I would like to create a million documents index
from the original for some tests.

How can I do it?

Thanks in advance
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Creating-a-sub-index-from-another-tp1438386p1438386.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr searching harri finds harry

2010-09-08 Thread Grijesh.singh

U have not provided much detail about analysis of that field,but I am sure
that problem because of stemming
u can see by analysis page or by debugQuery=on parameter.

To prevent stemming u have to put words in protword.txt on which u do not
need any stemming

-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-searching-harri-finds-harry-tp1438486p1438637.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Phrase search + multi-word index time expanded synonym

2010-09-08 Thread Xavier Schepler

On 08/09/2010 12:21, Grijesh.singh wrote:

see the analysis.jsp with debug verbose and see what happens at index time
and search time during analysis with your data

Also u can use debugQuery=on for seeing what actually parsed query is.

-
Grijesh
   
I've found a first solution by myself, using the query analyzer, that 
works for couple of synonyms. I have to test it with rows of 3 or 4 
equivalents synonyms.

I used analysis.jsp.

The query time analyzer became :

analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms2-fr.txt 
ignoreCase=true expand=true/

filter class=solr.LowerCaseFilterFactory/
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/

/analyzer

And the synonyms2-fr.txt contains :

PS = Parti socialiste

Thxs for your reply.


Re: Creating a sub-index from another

2010-09-08 Thread Grijesh.singh

you need a separate solr core for that
and have to write a processor which process with your original index ,then
generate the xml data and push to the new core.That is the simple way that i
have used many times.   

-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Creating-a-sub-index-from-another-tp1438386p1438673.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr searching harri finds harry

2010-09-08 Thread Kura
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I have harry as a protected word in protword.txt

Here is the xml definition for my text column

fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true /
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
/analyzer
/fieldType

On 08/09/10 11:29, Grijesh.singh wrote:
 
 U have not provided much detail about analysis of that field,but I am sure
 that problem because of stemming
 u can see by analysis page or by debugQuery=on parameter.
 
 To prevent stemming u have to put words in protword.txt on which u do not
 need any stemming
 
 -
 Grijesh
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAkyHaHIACgkQLOut9Un89NmR6wCgjOS+znMEqUQKn3ACzWudAaa4
faMAn2d0LX76ZBmiL+j/EtmVpvIpHiub
=5ymy
-END PGP SIGNATURE-


0x49FCF4D9.asc
Description: application/pgp-keys


0x49FCF4D9.asc.sig
Description: PGP signature


RE: Advice requested. How to map 1:M or M:M relationships with support for facets

2010-09-08 Thread Tim Gilbert
Thank you for your advice.

Tim
-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Tuesday, September 07, 2010 11:01 PM
To: solr-user@lucene.apache.org
Subject: Re: Advice requested. How to map 1:M or M:M relationships with
support for facets

These days the best practice for a 'drill-down' facet in a UI is to 
encode both the unique value of the facet and the displayable string 
into one facet value. In the UI, you unpack and show the display string,

and search with the full facet string.

If you want to also do date ranges, make a separate matching 'date' 
field. This will store the date twice. Solr schema design is all about 
denormalizing.

Tim Gilbert wrote:

 Hi guys,

 *Question:*

 What is the best way to create a solr schema which supports a 
 'multivalue' where the value is a two item array of event category and

 a date. I want to have faceted searches, counts and Date Range ability

 on both the category and the dates.

 *Details:*

 This is a person database where Person can have details about them 
 (like address) and Person have many Events. Events have a category 
 (type of event) and a Date for when that event occurred. At the bottom

 you will see a simple diagram showing the relationship. Briefly, a 
 Person has many Events and Events have a single category and a single 
 person.

 What I would like to be able to do is:

 Have a facet which shows all of the event categories, with a 
 'sub-facet' that show Category + date. For example, if a Category was 
 Attended Conference and date was 2008-09-08, I'd be able to show a 
 count of all Attended Conference, then have a tree type control and 
 show the years (for example):

 Eg.

 + Attended Conference (1038)

 |

 + 2010 (100)

 +--- 2009 (134)

 +--- 2008 (234)

 |

 + Another Event Category (23432)

 |

 +-2010 (234)

 +2009 (245)

 Etc.

 For scale, I expect to have  100 Event Categories and  a million 
 person_event records on  250,000 persons. I don't care very much 
 about disk space, so if it's a 1 GB or 100 GB due to indexing, that's 
 okay if the solution works (and its fast! J)

 *Solutions I looked at:*

 * I looked at poly but they seem to be a fixed length and appeared
   to be the same type. Typical use case was latitude  longitude.
   I don't think this will work because there are a variable number
   of events attached to a person.
 * I looked at multiValued but it didn't seem to permit two fields
   having a relationship, ie. Event Category  Event Date. It
   seemed to me that they need to be broken out. That's not
   necessarily a bad thing, but it didn't seem ideal.
 * I thought about concatenating category  date to create a fake
   fields strictly for faceting purposes, but I believe that will
   break date ranges. Eg. EventCategoryId + | + Date = 1|2009 as
   a facet would allow me to show counts for that event type. Seems
   a bit unwieldy to me...

 What's the groups advice for handling this situation in the best way?

 Thanks in advance, as always sorry if this question has been asked and

 answered a few times already. I googled for a few hours before writing

 this... but things change so fast with Solr that any article older
than 
 a year was suspect to me, also there are so many patches that provide 
 additional functionality...

 Tim

 Schema:



Re: Multi core schema file

2010-09-08 Thread Grijesh.singh

solr.xml allows you to mention the other properties as well like
instanceDir, config,schema in the cores/core tag

So , sharing the entire conf dir may not be possible , but it is
possible to share solrconfig.xml  and schema.xml

U can see the detail parameters at wiki page
http://wiki.apache.org/solr/CoreAdmin

-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-core-schema-file-tp1438460p1438720.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr searching harri finds harry

2010-09-08 Thread Grijesh.singh

have u restart the solr after adding words in protwords and reindex the data?

-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-searching-harri-finds-harry-tp1438486p1438735.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr searching harri finds harry

2010-09-08 Thread Kura
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Yes to restart, no to re-index. Was hoping that wouldn't be necessary.
I'll do that now.

On 08/09/10 11:48, Grijesh.singh wrote:
 
 have u restart the solr after adding words in protwords and reindex the data?
 
 -
 Grijesh
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAkyHbEEACgkQLOut9Un89NmAaACfdl5P/GOikHvBHu0A9/6ma30q
jXYAoIAbN8tAnMc4ecqwJ4Q8r/Un3Cio
=vmU8
-END PGP SIGNATURE-


0x49FCF4D9.asc
Description: application/pgp-keys


0x49FCF4D9.asc.sig
Description: PGP signature


Re: Solr searching harri finds harry

2010-09-08 Thread Grijesh.singh

yes reindexing is necessary for protwords,synanym update

-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-searching-harri-finds-harry-tp1438486p1438802.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Batch update, order of evaluation

2010-09-08 Thread Erick Erickson
This would be surprising behavior, if you can reliably reproduce this
it's worth a JIRA.

But (and I'm stretching a bit here) are you sure you're committing at the
end of the batch AND are you sure you're looking after the commit? Here's
the scenario: Your updated document is a position 1 and 100 in your batch.
Somewhere around SOLR processing document 50, an autocommit occurs,
and you're looking at your results before SOLR gets around to committing
document 100. Like I said, it's a stretch.

To test this, you need to be absolutely sure of two things before you
search:
1 the batch is finished processing
2 you've issued a commit after the last document in the batch.

If you're sure of the above and still see the problem, please let us know...

HTH
Erick

On Tue, Sep 7, 2010 at 10:32 PM, Greg Pendlebury
greg.pendleb...@gmail.comwrote:

 Does anyone know with certainty how (or even if) order is evaluated when
 updates are performed by batch?

 Our application internally buffers solr documents for speed of ingest
 before
 sending them to the server in chunks. The XML documents sent to the solr
 server contain all documents in the order they arrived without any settings
 changed from the defaults (so overwrite = true). We are careful to avoid
 things like HashMaps on our side since they'd lose the order, but I can't
 be
 certain what occurs inside Solr.

 Sometimes if an object has been indexed twice for various reasons it could
 appear twice in the buffer but the most up-to-date version is always last.
 I
 have however observed instances where the first copy of the document is
 indexed and differences in the second copy are missing. Does this sound
 likely? And if so are there any obvious settings I can play with to get the
 behavior I desire?

 I looked at:
 http://wiki.apache.org/solr/UpdateXmlMessages

 but there is no mention of order, just the overwrite flag (which I'm unsure
 how it is applied internally to an update message) and the deprecated
 duplicates flag (which I have no idea about).

 Would switching to SolrInputDocuments on a CommonsHttpSolrServer help? as
 per http://wiki.apache.org/solr/Solrj. This is no mention of order there
 either however.

 Thanks to anyone who took the time to read this.

 Ta,
 Greg



Re: list of filters/factories/Input handlers/blah blah

2010-09-08 Thread Erick Erickson
See the javadocs at:
http://lucene.apache.org/solr/api/org/apache/solr/analysis/package-summary.html

http://lucene.apache.org/solr/api/org/apache/solr/analysis/package-summary.htmlalso
see:
http://wiki.apache.org/solr/LanguageAnalysis

http://wiki.apache.org/solr/LanguageAnalysisBoth of these are linked from
the page Jonathan referenced. The JavaDocs
will be the most up to date...

Best
Erick

On Tue, Sep 7, 2010 at 11:56 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 Not neccesarily definitive, but filters and tokenizers can be found here:

 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

 Not sure if that's all of the analyzers (which I think is the generic name
 for both tokenizers and filters) that come with Solr, but I believe it's at
 least most of them. It's of course possible to write your own analyzers or
 use third party analyzers too, if there's a list of such available, I don't
 know about it, but it sure would be handy.

 Some Query parsers, which I _think_ is the right term for things you can
 pass as defType=something or {!type=something}, or one or two other things
 with different key names I forget, can be found here:


 http://wiki.apache.org/solr/SolrQuerySyntax#Other_built-in_useful_query_parsers

 Along with lucene and dismax also mentioned on that page, I _think_
 that's the complete list of query parsers included with Solr 1.4, but
 someone PLEASE correct me if I'm wrong. It is indeed difficult to get a
 handle on this stuff for me too.

 Other than query parsers and analyzers, I'm not entirely certain what else
 falls in the category of I/O components.  I don't know anything about input
 handlers, myself.

 Jonathan
 
 From: Dennis Gearon [gear...@sbcglobal.net]
 Sent: Tuesday, September 07, 2010 10:41 PM
 To: solr-user@lucene.apache.org
 Subject: list of filters/factories/Input handlers/blah blah

 Is there a definitive list of:

   filters
inputHandlers

 and other 'code fragments' that do I/O processing for Solr/Lucene?


 Dennis Gearon

 Signature Warning
 
 EARTH has a Right To Life,
  otherwise we all die.

 Read 'Hot, Flat, and Crowded'
 Laugh at http://www.yert.com/film.php



Re: Alphanumeric wildcard search problem

2010-09-08 Thread Erick Erickson
Ah, thanks. That reconciles our differing results.

Best
Erick

On Wed, Sep 8, 2010 at 2:58 AM, Hasnain hasn...@hotmail.com wrote:


 The real problem was this tag

 !-- field for the QueryParser to use when an explicit fieldname is absent
 --
 defaultSearchFieldtext/defaultSearchField

 and I was quering like this q=r-1* instead of q=mat_nr:r-1*
 so whatever fieldType I use for mat_nr, it was using text fieldType which
 had WordDelimiterFilterFactory, hence I had to put space inorder to get it
 running.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Alphanumeric-wildcard-search-problem-tp1393332p1437584.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Batch update, order of evaluation

2010-09-08 Thread Greg Pendlebury
Thanks,

I'll create a deliberate test tomorrow feed some random data through it
several times to see what happens.

I'm also working on simply improving the buffer to handle the situation
internally, but a few hours of testing isn't a big deal.

Ta,
Greg

On 8 September 2010 21:41, Erick Erickson erickerick...@gmail.com wrote:

 This would be surprising behavior, if you can reliably reproduce this
 it's worth a JIRA.

 But (and I'm stretching a bit here) are you sure you're committing at the
 end of the batch AND are you sure you're looking after the commit? Here's
 the scenario: Your updated document is a position 1 and 100 in your batch.
 Somewhere around SOLR processing document 50, an autocommit occurs,
 and you're looking at your results before SOLR gets around to committing
 document 100. Like I said, it's a stretch.

 To test this, you need to be absolutely sure of two things before you
 search:
 1 the batch is finished processing
 2 you've issued a commit after the last document in the batch.

 If you're sure of the above and still see the problem, please let us
 know...

 HTH
 Erick

 On Tue, Sep 7, 2010 at 10:32 PM, Greg Pendlebury
 greg.pendleb...@gmail.comwrote:

  Does anyone know with certainty how (or even if) order is evaluated when
  updates are performed by batch?
 
  Our application internally buffers solr documents for speed of ingest
  before
  sending them to the server in chunks. The XML documents sent to the solr
  server contain all documents in the order they arrived without any
 settings
  changed from the defaults (so overwrite = true). We are careful to avoid
  things like HashMaps on our side since they'd lose the order, but I can't
  be
  certain what occurs inside Solr.
 
  Sometimes if an object has been indexed twice for various reasons it
 could
  appear twice in the buffer but the most up-to-date version is always
 last.
  I
  have however observed instances where the first copy of the document is
  indexed and differences in the second copy are missing. Does this sound
  likely? And if so are there any obvious settings I can play with to get
 the
  behavior I desire?
 
  I looked at:
  http://wiki.apache.org/solr/UpdateXmlMessages
 
  but there is no mention of order, just the overwrite flag (which I'm
 unsure
  how it is applied internally to an update message) and the deprecated
  duplicates flag (which I have no idea about).
 
  Would switching to SolrInputDocuments on a CommonsHttpSolrServer help? as
  per http://wiki.apache.org/solr/Solrj. This is no mention of order there
  either however.
 
  Thanks to anyone who took the time to read this.
 
  Ta,
  Greg
 



Re: Query result ranking - Score independent

2010-09-08 Thread Erick Erickson
The change in the schema shouldn't matter (emphasis on the should).

What version of SOLR are you using? I tried this query and it works just
fine for me, I'm using 1.4.1

Best
Erick

On Wed, Sep 8, 2010 at 4:38 AM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:

 My request was very simple:
 q= astronomy^0
 And Solr returned the exception.
 Maybe the zero boost factor is not causing the exception?

 1) We indexed n documents with a Schema.xml.
 2)Then we changed some field type in the Schema.xml
 3)Then we indexed other m documents

 Maybe this could cause the exception?



 2010/9/7 Grant Ingersoll gsing...@apache.org

 
  On Sep 7, 2010, at 7:08 AM, Alessandro Benedetti wrote:
 
   Hi all,
   I need to retrieve query-results with a ranking independent from each
   query-result's default lucene score, which means assigning the same
 score
  to
   each query result.
   I tried to use a zero boost factor ( ^0 ) to reset to zero each
   query-result's score.
   This strategy seems to work within the example solr instance, but in
 my
   Solr instance, using a zero boost factor causes a Buffer Exception
   (
   HTTP Status 500 - null java.lang.IllegalArgumentException at
   java.nio.Buffer.limit(Buffer.java:249) at
  
 
 org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:123)
   at
  
 
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157)
   at
  
 
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
   at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:70) at
   org.apache.lucene.store.IndexInput.readLong(IndexInput.java:93) at
   org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:210) at
   org.apache.lucene.index.SegmentReader.document(SegmentReader.java:948)
 at
  
 
 org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:506)
   at org.apache.lucene.index.IndexReader.document(IndexReader.java:947)
   )
 
  Hmm, that stack trace doesn't align w/ the boost factor.  What  was your
  request?  I think there might be something else wrong here.
 
   Do you know any other technique to reset to some fixed constant value,
  all
   the query-result's scores?
   Each query result should obtain the same score.
   Any suggestion?
 
 
  The ConstantScoreQuery or a Filter should do this.  You could do
 something
  like:
 
  q=*:*fq=the real query, as in q=*:*fq=field:foo
 
  -Grant
 
 
  --
  Grant Ingersoll
  http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct
 7-8
 
 


 --
 --

 Benedetti Alessandro
 Personal Page: http://tigerbolt.altervista.org

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England



Re: Query result ranking - Score independent

2010-09-08 Thread Erick Erickson
Ooops, hit send too quickly. Could you show us the entire URL you send
that produces the error?

Erick

On Wed, Sep 8, 2010 at 7:58 AM, Erick Erickson erickerick...@gmail.comwrote:

 The change in the schema shouldn't matter (emphasis on the should).

 What version of SOLR are you using? I tried this query and it works just
 fine for me, I'm using 1.4.1

 Best
 Erick


 On Wed, Sep 8, 2010 at 4:38 AM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:

 My request was very simple:
 q= astronomy^0
 And Solr returned the exception.
 Maybe the zero boost factor is not causing the exception?

 1) We indexed n documents with a Schema.xml.
 2)Then we changed some field type in the Schema.xml
 3)Then we indexed other m documents

 Maybe this could cause the exception?



 2010/9/7 Grant Ingersoll gsing...@apache.org

 
  On Sep 7, 2010, at 7:08 AM, Alessandro Benedetti wrote:
 
   Hi all,
   I need to retrieve query-results with a ranking independent from each
   query-result's default lucene score, which means assigning the same
 score
  to
   each query result.
   I tried to use a zero boost factor ( ^0 ) to reset to zero each
   query-result's score.
   This strategy seems to work within the example solr instance, but in
 my
   Solr instance, using a zero boost factor causes a Buffer Exception
   (
   HTTP Status 500 - null java.lang.IllegalArgumentException at
   java.nio.Buffer.limit(Buffer.java:249) at
  
 
 org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:123)
   at
  
 
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157)
   at
  
 
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
   at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:70) at
   org.apache.lucene.store.IndexInput.readLong(IndexInput.java:93) at
   org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:210) at
   org.apache.lucene.index.SegmentReader.document(SegmentReader.java:948)
 at
  
 
 org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:506)
   at org.apache.lucene.index.IndexReader.document(IndexReader.java:947)
   )
 
  Hmm, that stack trace doesn't align w/ the boost factor.  What  was your
  request?  I think there might be something else wrong here.
 
   Do you know any other technique to reset to some fixed constant value,
  all
   the query-result's scores?
   Each query result should obtain the same score.
   Any suggestion?
 
 
  The ConstantScoreQuery or a Filter should do this.  You could do
 something
  like:
 
  q=*:*fq=the real query, as in q=*:*fq=field:foo
 
  -Grant
 
 
  --
  Grant Ingersoll
  http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct
 7-8
 
 


 --
 --

 Benedetti Alessandro
 Personal Page: http://tigerbolt.altervista.org

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England





Re: Solr, c/s type ?

2010-09-08 Thread Travis Low
I'll guess he means client/server.

On Tue, Sep 7, 2010 at 5:52 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 : Subject: Solr, c/s type ?
 :
 : i'm wondering c/s type is possible (not http web type).
 : if possible, could i get the material about it?

 You're going t oneed to provide more info exaplining what it is you are
 asking baout -- i don't know about anyone else, but i honestly have
 absolutely no idea what you might possibly mean by c/s type is possible
 (not http web type)

 -Hoss

 --
 http://lucenerevolution.org/  ...  October 7-8, Boston
 http://bit.ly/stump-hoss  ...  Stump The Chump!




RE: Solr, c/s type ?

2010-09-08 Thread Jonathan Rochkind
 I'll guess he means client/server.

HTTP is a client/server protocol, isn't it?  


Re: Null Pointer Exception with shardsfacets where some shards have no values for some facets.

2010-09-08 Thread Yonik Seeley
On Tue, Sep 7, 2010 at 8:31 PM, Ron Mayer r...@0ape.com wrote:
 Short summary:
  * Mixing Facets and Shards give me a NullPointerException
    when not all docs have all facets.

https://issues.apache.org/jira/browse/SOLR-2110

I believe the underlying real issue stemmed from your use of a complex
key involvement/race_facet.

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8


How to import data with a different date format

2010-09-08 Thread Rico Lelina
Hi,

I am attempting to import some of our data into SOLR. I did it the quickest way 
I know because I literally only have 2 days to import the data and do some 
queries for a proof-of-concept.

So I have this data in XML format and I wrote a short XSLT script to convert it 
to the format in solr/example/exampledocs (except I retained the element names 
so I had to modify schema.xml in the conf directory. So far so good -- the 
import works and I can search the data. One of my immediate problems is that 
there is a date field with the format MM/DD/. Looking at schema.xml, it 
seems SOLR accepts only full date fields -- everything seems to be mandatory 
including the Z for Zulu/UTC time according to the doc. Is there a way to 
specify the date format?

Thanks very much.
Rico



Invariants on a specific fq value

2010-09-08 Thread Markus Jelsma
Hi,

I have an index with several collections. Every document has a collection 
field that specifies the collection it belongs to. To make querying easier 
(and restrict exposed parameters) i have a request handler for each 
collection. The request handlers are largely the same and preset all 
parameters using invariants.

Well, this is all very nice. But there is a catch, i cannot make an invariant 
of the fq parameter because it's being used (from the outside) to navigate 
through the facets. This means that the outside world can specify any value 
for the fq parameter.

With the fq parameter being exposed, it is possible for request handler X to 
query documents that belong to collection Y and vice versa. But, as you might 
guess by now, request handler X should only be allowed to retrieve documents 
that belong to collection X.

I know there are some discussions on how to restrict users to certain 
documents but i'd like to know if it is doable to patch the request handler 
logic to add an invariant-like directive that allows me to restrict a certain 
value for a certain parameter, but allow different values for that parameters.

To give an example:

requestHandler name=collection_x
lst name=invariants
str name=defTypedismax/str
... More invariants here
/lst

lst name=what_should_we_call_this?
str name=fqfieldName:collection_x/str
/lst
/requestHandler

The above configuration won't allow to change the defType and won't allow a 
value to be specified for the fieldName through the fq parameter. It will 
allow the outside worls to specify a value on another field through the fq 
parameter such as : fq:anotherField:someValue.

Any ideas? 


Cheers,

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



RE: How to import data with a different date format

2010-09-08 Thread Markus Jelsma
No. The Datefield [1] will not accept it any other way. You could, however, 
fool your boss and dump your dates in an ordinary string field. But then you 
cannot use some of the nice date features.

 

[1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html
 
-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 17:36
To: solr-user@lucene.apache.org; 
Subject: How to import data with a different date format

Hi,

I am attempting to import some of our data into SOLR. I did it the quickest way 
I know because I literally only have 2 days to import the data and do some 
queries for a proof-of-concept.

So I have this data in XML format and I wrote a short XSLT script to convert it 
to the format in solr/example/exampledocs (except I retained the element names 
so I had to modify schema.xml in the conf directory. So far so good -- the 
import works and I can search the data. One of my immediate problems is that 
there is a date field with the format MM/DD/. Looking at schema.xml, it 
seems SOLR accepts only full date fields -- everything seems to be mandatory 
including the Z for Zulu/UTC time according to the doc. Is there a way to 
specify the date format?

Thanks very much.
Rico



Re: How to import data with a different date format

2010-09-08 Thread Rico Lelina
That was my first thought :-) But it would be nice to be able to do date 
queries. I guess when I export the data I can just add 00:00:00Z.

Thanks.


- Original Message 
From: Markus Jelsma markus.jel...@buyways.nl
To: solr-user@lucene.apache.org
Sent: Wed, September 8, 2010 11:34:32 AM
Subject: RE: How to import data with a different date format

No. The Datefield [1] will not accept it any other way. You could, however, 
fool 
your boss and dump your dates in an ordinary string field. But then you cannot 
use some of the nice date features.

 

[1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html 
 
-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 17:36
To: solr-user@lucene.apache.org; 
Subject: How to import data with a different date format

Hi,

I am attempting to import some of our data into SOLR. I did it the quickest way 
I know because I literally only have 2 days to import the data and do some 
queries for a proof-of-concept.

So I have this data in XML format and I wrote a short XSLT script to convert it 
to the format in solr/example/exampledocs (except I retained the element names 
so I had to modify schema.xml in the conf directory. So far so good -- the 
import works and I can search the data. One of my immediate problems is that 
there is a date field with the format MM/DD/. Looking at schema.xml, it 
seems SOLR accepts only full date fields -- everything seems to be mandatory 
including the Z for Zulu/UTC time according to the doc. Is there a way to 
specify the date format?

Thanks very much.
Rico


RE: Re: How to import data with a different date format

2010-09-08 Thread Markus Jelsma
Your format (MM/DD/) is not compatible. 
 
-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 19:03
To: solr-user@lucene.apache.org; 
Subject: Re: How to import data with a different date format

That was my first thought :-) But it would be nice to be able to do date 
queries. I guess when I export the data I can just add 00:00:00Z.

Thanks.


- Original Message 
From: Markus Jelsma markus.jel...@buyways.nl
To: solr-user@lucene.apache.org
Sent: Wed, September 8, 2010 11:34:32 AM
Subject: RE: How to import data with a different date format

No. The Datefield [1] will not accept it any other way. You could, however, 
fool 
your boss and dump your dates in an ordinary string field. But then you cannot 
use some of the nice date features.



[1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html 

-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 17:36
To: solr-user@lucene.apache.org; 
Subject: How to import data with a different date format

Hi,

I am attempting to import some of our data into SOLR. I did it the quickest way 
I know because I literally only have 2 days to import the data and do some 
queries for a proof-of-concept.

So I have this data in XML format and I wrote a short XSLT script to convert it 
to the format in solr/example/exampledocs (except I retained the element names 
so I had to modify schema.xml in the conf directory. So far so good -- the 
import works and I can search the data. One of my immediate problems is that 
there is a date field with the format MM/DD/. Looking at schema.xml, it 
seems SOLR accepts only full date fields -- everything seems to be mandatory 
including the Z for Zulu/UTC time according to the doc. Is there a way to 
specify the date format?

Thanks very much.
Rico

 


Re: How to import data with a different date format

2010-09-08 Thread Erick Erickson
I think Markus is spot-on given the fact that you have 2 days. Using a
string field is quickest.

However, if you absolutely MUST have functioning dates, there are three
options I can think of:
1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant
2 use DIH and DateTransformer, see:
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer
  you can walk a directory importing all the XML files with
FileDataSource.
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you
could write a program to do this manually.

But given the time constraints, I suspect your time would be better spent
doing the other stuff and just using string as per Markus. I have no clue
how SOLR-savvy you are, so pardon if this is something you already know. But
lots of people trip up over the string field type, which is NOT tokenized.
You usually want text unless it's some sort of ID So it might be worth
it to do some searching earlier rather than later G

Best
Erick

On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote:

 No. The Datefield [1] will not accept it any other way. You could, however,
 fool your boss and dump your dates in an ordinary string field. But then you
 cannot use some of the nice date features.



 [1]:
 http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html

 -Original message-
 From: Rico Lelina rlel...@yahoo.com
 Sent: Wed 08-09-2010 17:36
 To: solr-user@lucene.apache.org;
 Subject: How to import data with a different date format

 Hi,

 I am attempting to import some of our data into SOLR. I did it the quickest
 way
 I know because I literally only have 2 days to import the data and do some
 queries for a proof-of-concept.

 So I have this data in XML format and I wrote a short XSLT script to
 convert it
 to the format in solr/example/exampledocs (except I retained the element
 names
 so I had to modify schema.xml in the conf directory. So far so good -- the
 import works and I can search the data. One of my immediate problems is
 that
 there is a date field with the format MM/DD/. Looking at schema.xml, it
 seems SOLR accepts only full date fields -- everything seems to be
 mandatory
 including the Z for Zulu/UTC time according to the doc. Is there a way to
 specify the date format?

 Thanks very much.
 Rico




Re: Invariants on a specific fq value

2010-09-08 Thread Jonathan Rochkind
I just found out about 'invariants', and I found out about another thing 
too: appends.   (I don't think either of these are actually documented 
anywhere?).


I think maybe appends rather than invariants, with your fq you want 
always to be there might be exactly what you want?


I actually forget whether it's append or appends, and am not sure if 
it's documented anywhere, try both I guess. But apparently it does exist 
in 1.4.


Jonathan

Markus Jelsma wrote:

Hi,

I have an index with several collections. Every document has a collection 
field that specifies the collection it belongs to. To make querying easier 
(and restrict exposed parameters) i have a request handler for each 
collection. The request handlers are largely the same and preset all 
parameters using invariants.


Well, this is all very nice. But there is a catch, i cannot make an invariant 
of the fq parameter because it's being used (from the outside) to navigate 
through the facets. This means that the outside world can specify any value 
for the fq parameter.


With the fq parameter being exposed, it is possible for request handler X to 
query documents that belong to collection Y and vice versa. But, as you might 
guess by now, request handler X should only be allowed to retrieve documents 
that belong to collection X.


I know there are some discussions on how to restrict users to certain 
documents but i'd like to know if it is doable to patch the request handler 
logic to add an invariant-like directive that allows me to restrict a certain 
value for a certain parameter, but allow different values for that parameters.


To give an example:

requestHandler name=collection_x
lst name=invariants
str name=defTypedismax/str
... More invariants here
/lst

lst name=what_should_we_call_this?
str name=fqfieldName:collection_x/str
/lst
/requestHandler

The above configuration won't allow to change the defType and won't allow a 
value to be specified for the fieldName through the fq parameter. It will 
allow the outside worls to specify a value on another field through the fq 
parameter such as : fq:anotherField:someValue.


Any ideas? 



Cheers,

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

  


Re: Re: How to import data with a different date format

2010-09-08 Thread Rico Lelina
It will work. The original data is in XML format. I have an XSLT that 
transforms 
the data into the same format as that in exampledocs: adddocfield 
name=../field/doc.../add.



- Original Message 
From: Markus Jelsma markus.jel...@buyways.nl
To: solr-user@lucene.apache.org
Sent: Wed, September 8, 2010 12:06:39 PM
Subject: RE: Re: How to import data with a different date format

Your format (MM/DD/) is not compatible. 
 
-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 19:03
To: solr-user@lucene.apache.org; 
Subject: Re: How to import data with a different date format

That was my first thought :-) But it would be nice to be able to do date 
queries. I guess when I export the data I can just add 00:00:00Z.

Thanks.


- Original Message 
From: Markus Jelsma markus.jel...@buyways.nl
To: solr-user@lucene.apache.org
Sent: Wed, September 8, 2010 11:34:32 AM
Subject: RE: How to import data with a different date format

No. The Datefield [1] will not accept it any other way. You could, however, 
fool 

your boss and dump your dates in an ordinary string field. But then you cannot 
use some of the nice date features.



[1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html 

-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 17:36
To: solr-user@lucene.apache.org; 
Subject: How to import data with a different date format

Hi,

I am attempting to import some of our data into SOLR. I did it the quickest way 
I know because I literally only have 2 days to import the data and do some 
queries for a proof-of-concept.

So I have this data in XML format and I wrote a short XSLT script to convert it 
to the format in solr/example/exampledocs (except I retained the element names 
so I had to modify schema.xml in the conf directory. So far so good -- the 
import works and I can search the data. One of my immediate problems is that 
there is a date field with the format MM/DD/. Looking at schema.xml, it 
seems SOLR accepts only full date fields -- everything seems to be mandatory 
including the Z for Zulu/UTC time according to the doc. Is there a way to 
specify the date format?

Thanks very much.
Rico


Re: How to import data with a different date format

2010-09-08 Thread Rico Lelina
I'm going with option 1, converting MM/DD/ to -MM-DD (which is fairly 
easy in XSLT) and then adding T00:00:00Z to it.

Thanks.



- Original Message 
From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wed, September 8, 2010 12:09:55 PM
Subject: Re: How to import data with a different date format

I think Markus is spot-on given the fact that you have 2 days. Using a
string field is quickest.

However, if you absolutely MUST have functioning dates, there are three
options I can think of:
1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant
2 use DIH and DateTransformer, see:
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer
  you can walk a directory importing all the XML files with
FileDataSource.
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you
could write a program to do this manually.

But given the time constraints, I suspect your time would be better spent
doing the other stuff and just using string as per Markus. I have no clue
how SOLR-savvy you are, so pardon if this is something you already know. But
lots of people trip up over the string field type, which is NOT tokenized.
You usually want text unless it's some sort of ID So it might be worth
it to do some searching earlier rather than later G

Best
Erick

On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote:

 No. The Datefield [1] will not accept it any other way. You could, however,
 fool your boss and dump your dates in an ordinary string field. But then you
 cannot use some of the nice date features.



 [1]:
 http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html

 -Original message-
 From: Rico Lelina rlel...@yahoo.com
 Sent: Wed 08-09-2010 17:36
 To: solr-user@lucene.apache.org;
 Subject: How to import data with a different date format

 Hi,

 I am attempting to import some of our data into SOLR. I did it the quickest
 way
 I know because I literally only have 2 days to import the data and do some
 queries for a proof-of-concept.

 So I have this data in XML format and I wrote a short XSLT script to
 convert it
 to the format in solr/example/exampledocs (except I retained the element
 names
 so I had to modify schema.xml in the conf directory. So far so good -- the
 import works and I can search the data. One of my immediate problems is
 that
 there is a date field with the format MM/DD/. Looking at schema.xml, it
 seems SOLR accepts only full date fields -- everything seems to be
 mandatory
 including the Z for Zulu/UTC time according to the doc. Is there a way to
 specify the date format?

 Thanks very much.
 Rico





RE: Re: How to import data with a different date format

2010-09-08 Thread Markus Jelsma
Ah, that answers Erick's question. And mine ;) 
 
-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 19:25
To: solr-user@lucene.apache.org; 
Subject: Re: How to import data with a different date format

I'm going with option 1, converting MM/DD/ to -MM-DD (which is fairly 
easy in XSLT) and then adding T00:00:00Z to it.

Thanks.



- Original Message 
From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wed, September 8, 2010 12:09:55 PM
Subject: Re: How to import data with a different date format

I think Markus is spot-on given the fact that you have 2 days. Using a
string field is quickest.

However, if you absolutely MUST have functioning dates, there are three
options I can think of:
1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant
2 use DIH and DateTransformer, see:
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer
     you can walk a directory importing all the XML files with
FileDataSource.
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you
could write a program to do this manually.

But given the time constraints, I suspect your time would be better spent
doing the other stuff and just using string as per Markus. I have no clue
how SOLR-savvy you are, so pardon if this is something you already know. But
lots of people trip up over the string field type, which is NOT tokenized.
You usually want text unless it's some sort of ID So it might be worth
it to do some searching earlier rather than later G

Best
Erick

On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote:

 No. The Datefield [1] will not accept it any other way. You could, however,
 fool your boss and dump your dates in an ordinary string field. But then you
 cannot use some of the nice date features.



 [1]:
 http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html

 -Original message-
 From: Rico Lelina rlel...@yahoo.com
 Sent: Wed 08-09-2010 17:36
 To: solr-user@lucene.apache.org;
 Subject: How to import data with a different date format

 Hi,

 I am attempting to import some of our data into SOLR. I did it the quickest
 way
 I know because I literally only have 2 days to import the data and do some
 queries for a proof-of-concept.

 So I have this data in XML format and I wrote a short XSLT script to
 convert it
 to the format in solr/example/exampledocs (except I retained the element
 names
 so I had to modify schema.xml in the conf directory. So far so good -- the
 import works and I can search the data. One of my immediate problems is
 that
 there is a date field with the format MM/DD/. Looking at schema.xml, it
 seems SOLR accepts only full date fields -- everything seems to be
 mandatory
 including the Z for Zulu/UTC time according to the doc. Is there a way to
 specify the date format?

 Thanks very much.
 Rico





Solr Highlighting Question

2010-09-08 Thread Jed Glazner

Thanks for taking time to read through this.  I'm using a checkout from

the solr 3.x branch

My problem is with the highlighter and wildcards

I can get the highlighter to work with wild cards just fine, the problem
is that  solr is returning the term matched, when what I want it to do
is highlight the chars in the term that were matched.


Example:

http://192.168.1.75:8983/solr/music/select?indent=onq=name_title:wel*qt=beyondhl=truehl.fl=name_titlef.name_title.hl.usePhraseHighlighter=truef.name_title.hl.highlightMultiTerm=true

The results that come back look like this:

emWelcome/em   to the Jungle

What I want them to look like is this:
emWel/emcome to the Jungle

  From what I gathered by searching the archives is that solr 1.1 used to
do this... Is there a way to get that functionality?

Thanks!



Re: How to import data with a different date format

2010-09-08 Thread Jonathan Rochkind
Just throwing it out there, I'd consider a different approach for an 
actual real app, although it might not be easier to get up quickly. (For 
quickly, yeah, I'd just store it as a string, more on that at bottom).


If none of your dates have times, they're all just full days, I'm not 
sure you really need the date type at all.


Convert the date to number-of-days since epoch integer.  (Most languages 
will have a way to do this, but I don't know about pure XSLT).  Store 
_that_ in a 1.4 'int' field.  On top of that, make it a tint 
(precision non-zero) for faster range queries.


But now your actual interface will have to convert from number of days 
since epoch to a displayable date. (And if you allow user input, 
convert the input to number-of-days-since-epoch before making a range 
query or fq, but you'd have to do that anyway even with solr dates, 
users aren't going to be entering W3CDate raw, I don't think).


That is probably the most efficient way to have solr handle it -- using 
an actual date field type gives you a lot more precision than you need, 
which is going to hurt performance on range queries. Which you can 
compensate for with trie date sure, but if you don't really need that 
precision to begin with, why use it?  Also the extra precision can end 
up doing unexpected things and making it easier to have bugs (range 
queries on that high precision stuff, you need to make sure your start 
date has 00:00:00 set and your end date has 23:59:59 set, to do what you 
probably expect). If you aren't going to use the extra precision, makes 
everything a lot simpler to not use a date field.


Alternately, for your get this done quick method, yeah, I'd just store 
it as a string. With a string exactly as you've specified, sorting and 
range queries won't work how you'd want.  But if you can make it a 
string of the format /mm/dd instead (always two-digit month and 
year), then you can even sort and do range queries on your string dates. 
For the quick and dirty prototype, I'd just do that.  In fact, while 
this might make range queries and sorting _slightly_ slower than if you 
use an int or a tint, this might really be good enough even for a real 
app (hey, it's what lots of people did before the trie-based fields 
existed).


Jonathan

Erick Erickson wrote:

I think Markus is spot-on given the fact that you have 2 days. Using a
string field is quickest.

However, if you absolutely MUST have functioning dates, there are three
options I can think of:
1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant
2 use DIH and DateTransformer, see:
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer
  you can walk a directory importing all the XML files with
FileDataSource.
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you
could write a program to do this manually.

But given the time constraints, I suspect your time would be better spent
doing the other stuff and just using string as per Markus. I have no clue
how SOLR-savvy you are, so pardon if this is something you already know. But
lots of people trip up over the string field type, which is NOT tokenized.
You usually want text unless it's some sort of ID So it might be worth
it to do some searching earlier rather than later G

Best
Erick

On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote:

  

No. The Datefield [1] will not accept it any other way. You could, however,
fool your boss and dump your dates in an ordinary string field. But then you
cannot use some of the nice date features.



[1]:
http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html

-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 17:36
To: solr-user@lucene.apache.org;
Subject: How to import data with a different date format

Hi,

I am attempting to import some of our data into SOLR. I did it the quickest
way
I know because I literally only have 2 days to import the data and do some
queries for a proof-of-concept.

So I have this data in XML format and I wrote a short XSLT script to
convert it
to the format in solr/example/exampledocs (except I retained the element
names
so I had to modify schema.xml in the conf directory. So far so good -- the
import works and I can search the data. One of my immediate problems is
that
there is a date field with the format MM/DD/. Looking at schema.xml, it
seems SOLR accepts only full date fields -- everything seems to be
mandatory
including the Z for Zulu/UTC time according to the doc. Is there a way to
specify the date format?

Thanks very much.
Rico





  


Re: How to import data with a different date format

2010-09-08 Thread Jonathan Rochkind
I'm really thinking, once you convert to -MM-DD anyway, you might be 
better off just sticking this in a string field, rather than using a 
date field at all. The extra precision in the date field is going to 
make things confusing later, I predict. Especially for a quick and dirty 
prototype, I'd just use a string.


Solr is not an rdbms, our learned behavior to always try and normalize 
everything and define the field 'right' often is not the right way to go 
with solr/lucene.


Jonathan

Rico Lelina wrote:
I'm going with option 1, converting MM/DD/ to -MM-DD (which is fairly 
easy in XSLT) and then adding T00:00:00Z to it.


Thanks.



- Original Message 
From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wed, September 8, 2010 12:09:55 PM
Subject: Re: How to import data with a different date format

I think Markus is spot-on given the fact that you have 2 days. Using a
string field is quickest.

However, if you absolutely MUST have functioning dates, there are three
options I can think of:
1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant
2 use DIH and DateTransformer, see:
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer
  you can walk a directory importing all the XML files with
FileDataSource.
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you
could write a program to do this manually.

But given the time constraints, I suspect your time would be better spent
doing the other stuff and just using string as per Markus. I have no clue
how SOLR-savvy you are, so pardon if this is something you already know. But
lots of people trip up over the string field type, which is NOT tokenized.
You usually want text unless it's some sort of ID So it might be worth
it to do some searching earlier rather than later G

Best
Erick

On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote:

  

No. The Datefield [1] will not accept it any other way. You could, however,
fool your boss and dump your dates in an ordinary string field. But then you
cannot use some of the nice date features.



[1]:
http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html

-Original message-
From: Rico Lelina rlel...@yahoo.com
Sent: Wed 08-09-2010 17:36
To: solr-user@lucene.apache.org;
Subject: How to import data with a different date format

Hi,

I am attempting to import some of our data into SOLR. I did it the quickest
way
I know because I literally only have 2 days to import the data and do some
queries for a proof-of-concept.

So I have this data in XML format and I wrote a short XSLT script to
convert it
to the format in solr/example/exampledocs (except I retained the element
names
so I had to modify schema.xml in the conf directory. So far so good -- the
import works and I can search the data. One of my immediate problems is
that
there is a date field with the format MM/DD/. Looking at schema.xml, it
seems SOLR accepts only full date fields -- everything seems to be
mandatory
including the Z for Zulu/UTC time according to the doc. Is there a way to
specify the date format?

Thanks very much.
Rico






  


RE: Re: Invariants on a specific fq value

2010-09-08 Thread Markus Jelsma
Interesting! I haven't met the appends method before and i'll be sure to give 
it a try tomorrow. Try, the wiki [1] is not very clear on what it really does.

 

More suggestions before tomorrow?

 

[1]: http://wiki.apache.org/solr/SolrSecurity#Path_Based_Authentication
 
-Original message-
From: Jonathan Rochkind rochk...@jhu.edu
Sent: Wed 08-09-2010 19:19
To: solr-user@lucene.apache.org; markus.jel...@buyways.nl; 
Subject: Re: Invariants on a specific fq value

I just found out about 'invariants', and I found out about another thing 
too: appends.   (I don't think either of these are actually documented 
anywhere?).

I think maybe appends rather than invariants, with your fq you want 
always to be there might be exactly what you want?

I actually forget whether it's append or appends, and am not sure if 
it's documented anywhere, try both I guess. But apparently it does exist 
in 1.4.

Jonathan

Markus Jelsma wrote:
 Hi,

 I have an index with several collections. Every document has a collection 
 field that specifies the collection it belongs to. To make querying easier 
 (and restrict exposed parameters) i have a request handler for each 
 collection. The request handlers are largely the same and preset all 
 parameters using invariants.

 Well, this is all very nice. But there is a catch, i cannot make an invariant 
 of the fq parameter because it's being used (from the outside) to navigate 
 through the facets. This means that the outside world can specify any value 
 for the fq parameter.

 With the fq parameter being exposed, it is possible for request handler X to 
 query documents that belong to collection Y and vice versa. But, as you might 
 guess by now, request handler X should only be allowed to retrieve documents 
 that belong to collection X.

 I know there are some discussions on how to restrict users to certain 
 documents but i'd like to know if it is doable to patch the request handler 
 logic to add an invariant-like directive that allows me to restrict a certain 
 value for a certain parameter, but allow different values for that parameters.

 To give an example:

 requestHandler name=collection_x
 lst name=invariants
 str name=defTypedismax/str
 ... More invariants here
 /lst

 lst name=what_should_we_call_this?
 str name=fqfieldName:collection_x/str
 /lst
 /requestHandler

 The above configuration won't allow to change the defType and won't allow a 
 value to be specified for the fieldName through the fq parameter. It will 
 allow the outside worls to specify a value on another field through the fq 
 parameter such as : fq:anotherField:someValue.

 Any ideas? 


 Cheers,

 Markus Jelsma - Technisch Architect - Buyways BV
 http://www.linkedin.com/in/markus17
 050-8536620 / 06-50258350

   

 


Re: Invariants on a specific fq value

2010-09-08 Thread Jonathan Rochkind
Ah, I NEVER would have thought to look for these 
defaults/invariants/appends stuff under 'security', that's why I never 
found it!  I can see now why it's sort of a security issue, but I, like 
you, use them just for convenience instead, and think of defaults, 
invariants, and appends as all in the same family, with different logic 
choices.


Markus Jelsma wrote:

Interesting! I haven't met the appends method before and i'll be sure to give 
it a try tomorrow. Try, the wiki [1] is not very clear on what it really does.

 


More suggestions before tomorrow?

 


[1]: http://wiki.apache.org/solr/SolrSecurity#Path_Based_Authentication
 
-Original message-

From: Jonathan Rochkind rochk...@jhu.edu
Sent: Wed 08-09-2010 19:19
To: solr-user@lucene.apache.org; markus.jel...@buyways.nl; 
Subject: Re: Invariants on a specific fq value


I just found out about 'invariants', and I found out about another thing 
too: appends.   (I don't think either of these are actually documented 
anywhere?).


I think maybe appends rather than invariants, with your fq you want 
always to be there might be exactly what you want?


I actually forget whether it's append or appends, and am not sure if 
it's documented anywhere, try both I guess. But apparently it does exist 
in 1.4.


Jonathan

Markus Jelsma wrote:
  

Hi,

I have an index with several collections. Every document has a collection 
field that specifies the collection it belongs to. To make querying easier 
(and restrict exposed parameters) i have a request handler for each 
collection. The request handlers are largely the same and preset all 
parameters using invariants.


Well, this is all very nice. But there is a catch, i cannot make an invariant 
of the fq parameter because it's being used (from the outside) to navigate 
through the facets. This means that the outside world can specify any value 
for the fq parameter.


With the fq parameter being exposed, it is possible for request handler X to 
query documents that belong to collection Y and vice versa. But, as you might 
guess by now, request handler X should only be allowed to retrieve documents 
that belong to collection X.


I know there are some discussions on how to restrict users to certain 
documents but i'd like to know if it is doable to patch the request handler 
logic to add an invariant-like directive that allows me to restrict a certain 
value for a certain parameter, but allow different values for that parameters.


To give an example:

requestHandler name=collection_x
lst name=invariants
str name=defTypedismax/str
... More invariants here
/lst

lst name=what_should_we_call_this?
str name=fqfieldName:collection_x/str
/lst
/requestHandler

The above configuration won't allow to change the defType and won't allow a 
value to be specified for the fieldName through the fq parameter. It will 
allow the outside worls to specify a value on another field through the fq 
parameter such as : fq:anotherField:someValue.


Any ideas? 



Cheers,

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

  



 

  


Re: How to import data with a different date format

2010-09-08 Thread Jonathan Rochkind



how SOLR-savvy you are, so pardon if this is something you already know. But
lots of people trip up over the string field type, which is NOT tokenized.
You usually want text unless it's some sort of ID So it might be worth
it to do some searching earlier rather than later G
  

Why would you want to tokenize a -mm-dd value?

I'm liking the 'string' type.  If you do -mm-dd, then you can even 
sort properly, and range query with endpoints also specified as 
-mm-dd, no?


Okay, I'll stop spamming the thread now, heh.

Jonathan



Re: Re: Invariants on a specific fq value

2010-09-08 Thread Yonik Seeley
2010 at 1:32 PM, Markus Jelsma markus.jel...@buyways.nl wrote:
 Interesting! I haven't met the appends method before and i'll be sure to give 
 it a try tomorrow. Try, the wiki [1] is not very clear on what it really does.

Here's a comment from the example solrconfig.xml:

!-- In addition to defaults, appends params can be specified
 to identify values which should be appended to the list of
 multi-val params from the query (or the existing defaults).

 In this example, the param fq=instock:true will be appended to
 any query time fq params the user may specify, as a mechanism for
 partitioning the index, independent of any user selected filtering
 that may also be desired (perhaps as a result of faceted searching).

 NOTE: there is *absolutely* nothing a client can do to prevent these
 appends values from being used, so don't use this mechanism
 unless you are sure you always want it.
  --

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8


RE: Re: Re: Invariants on a specific fq value

2010-09-08 Thread Markus Jelsma
Sounds great! I'll be very sure to put it to the test tomorrow and perhaps add 
documentation on these types to the solrconfigxml wiki page for reference.

 


 
-Original message-
From: Yonik Seeley yo...@lucidimagination.com
Sent: Wed 08-09-2010 19:38
To: solr-user@lucene.apache.org; 
Subject: Re: Re: Invariants on a specific fq value

2010 at 1:32 PM, Markus Jelsma markus.jel...@buyways.nl wrote:
 Interesting! I haven't met the appends method before and i'll be sure to give 
 it a try tomorrow. Try, the wiki [1] is not very clear on what it really does.

Here's a comment from the example solrconfig.xml:

   !-- In addition to defaults, appends params can be specified
        to identify values which should be appended to the list of
        multi-val params from the query (or the existing defaults).

        In this example, the param fq=instock:true will be appended to
        any query time fq params the user may specify, as a mechanism for
        partitioning the index, independent of any user selected filtering
        that may also be desired (perhaps as a result of faceted searching).

        NOTE: there is *absolutely* nothing a client can do to prevent these
        appends values from being used, so don't use this mechanism
        unless you are sure you always want it.
     --

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8


Re: How to import data with a different date format

2010-09-08 Thread Dennis Gearon
I'm doing something similar for dates/times/timestamps.

I'm actually trying to do, 'now' is within the range of what 
appointments(date/time from and to combos, i.e. timestamps).

Fairly simple search of:

   What items have a start time BEFORE now, and an end time AFTER now?

My thoughts were to store:
  unix time stamp BIGINTS (64 bit)
  ISO_DATE ISO_TIME strings

Which is going to be faster:
   1/ Indexing?
   2/ Searching?

How does the 'tint' field mentioned below apply?



Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote:

 From: Jonathan Rochkind rochk...@jhu.edu
 Subject: Re: How to import data with a different date format
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Wednesday, September 8, 2010, 10:27 AM
 Just throwing it out there, I'd
 consider a different approach for an actual real app,
 although it might not be easier to get up quickly. (For
 quickly, yeah, I'd just store it as a string, more on that
 at bottom).
 
 If none of your dates have times, they're all just full
 days, I'm not sure you really need the date type at all.
 
 Convert the date to number-of-days since epoch
 integer.  (Most languages will have a way to do this,
 but I don't know about pure XSLT).  Store _that_ in a
 1.4 'int' field.  On top of that, make it a tint
 (precision non-zero) for faster range queries.
 
 But now your actual interface will have to convert from
 number of days since epoch to a displayable date. (And if
 you allow user input, convert the input to
 number-of-days-since-epoch before making a range query or
 fq, but you'd have to do that anyway even with solr dates,
 users aren't going to be entering W3CDate raw, I don't
 think).
 
 That is probably the most efficient way to have solr handle
 it -- using an actual date field type gives you a lot more
 precision than you need, which is going to hurt performance
 on range queries. Which you can compensate for with trie
 date sure, but if you don't really need that precision to
 begin with, why use it?  Also the extra precision can
 end up doing unexpected things and making it easier to have
 bugs (range queries on that high precision stuff, you need
 to make sure your start date has 00:00:00 set and your end
 date has 23:59:59 set, to do what you probably expect). If
 you aren't going to use the extra precision, makes
 everything a lot simpler to not use a date field.
 
 Alternately, for your get this done quick method, yeah,
 I'd just store it as a string. With a string exactly as
 you've specified, sorting and range queries won't work how
 you'd want.  But if you can make it a string of the
 format /mm/dd instead (always two-digit month and
 year), then you can even sort and do range queries on your
 string dates. For the quick and dirty prototype, I'd just do
 that.  In fact, while this might make range queries and
 sorting _slightly_ slower than if you use an int or a tint,
 this might really be good enough even for a real app (hey,
 it's what lots of people did before the trie-based fields
 existed).
 
 Jonathan
 
 Erick Erickson wrote:
  I think Markus is spot-on given the fact that you have
 2 days. Using a
  string field is quickest.
  
  However, if you absolutely MUST have functioning
 dates, there are three
  options I can think of:
  1 can you make your XSLT transform the dates?
 Confession; I'm XSLT-ignorant
  2 use DIH and DateTransformer, see:
  http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer
        you can walk a
 directory importing all the XML files with
  FileDataSource.
  http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3
 you
  could write a program to do this manually.
  
  But given the time constraints, I suspect your time
 would be better spent
  doing the other stuff and just using string as per
 Markus. I have no clue
  how SOLR-savvy you are, so pardon if this is something
 you already know. But
  lots of people trip up over the string field type,
 which is NOT tokenized.
  You usually want text unless it's some sort of
 ID So it might be worth
  it to do some searching earlier rather than later
 G
  
  Best
  Erick
  
  On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma 
  markus.jel...@buyways.nlwrote:
  
    
  No. The Datefield [1] will not accept it any other
 way. You could, however,
  fool your boss and dump your dates in an ordinary
 string field. But then you
  cannot use some of the nice date features.
  
  
  
  [1]:
  http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html
  
  -Original message-
  From: Rico Lelina rlel...@yahoo.com
  Sent: Wed 08-09-2010 17:36
  To: solr-user@lucene.apache.org;
  Subject: How to import data with a different date
 format
  
  Hi,
  
  I am attempting to import some of our data into
 SOLR. I did it the 

Re: Invariants on a specific fq value

2010-09-08 Thread Jonathan Rochkind
If there is no default or request-provided value, will the appends 
still be used?  I suspect so, but let us know, perhaps by adding it to 
the wiki page!


Markus Jelsma wrote:

Sounds great! I'll be very sure to put it to the test tomorrow and perhaps add 
documentation on these types to the solrconfigxml wiki page for reference.

 



 
-Original message-

From: Yonik Seeley yo...@lucidimagination.com
Sent: Wed 08-09-2010 19:38
To: solr-user@lucene.apache.org; 
Subject: Re: Re: Invariants on a specific fq value


2010 at 1:32 PM, Markus Jelsma markus.jel...@buyways.nl wrote:
  

Interesting! I haven't met the appends method before and i'll be sure to give 
it a try tomorrow. Try, the wiki [1] is not very clear on what it really does.



Here's a comment from the example solrconfig.xml:

   !-- In addition to defaults, appends params can be specified
to identify values which should be appended to the list of
multi-val params from the query (or the existing defaults).

In this example, the param fq=instock:true will be appended to
any query time fq params the user may specify, as a mechanism for
partitioning the index, independent of any user selected filtering
that may also be desired (perhaps as a result of faceted searching).

NOTE: there is *absolutely* nothing a client can do to prevent these
appends values from being used, so don't use this mechanism
unless you are sure you always want it.
 --

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8

  


Re: How to import data with a different date format

2010-09-08 Thread Erick Erickson
That was a general comment on SOLR string types. Mostly I wanted to
prompt Rico to try some searching before getting too hung up on indexing
refinements. I'd far rather demo a prototype being able to say Dates don't
work yet, but you can search than searching is broken to pieces, but
dates work fine!.

FWIW
Erick

On Wed, Sep 8, 2010 at 1:33 PM, Jonathan Rochkind rochk...@jhu.edu wrote:


  how SOLR-savvy you are, so pardon if this is something you already know.
 But
 lots of people trip up over the string field type, which is NOT
 tokenized.
 You usually want text unless it's some sort of ID So it might be
 worth
 it to do some searching earlier rather than later G


 Why would you want to tokenize a -mm-dd value?

 I'm liking the 'string' type.  If you do -mm-dd, then you can even sort
 properly, and range query with endpoints also specified as -mm-dd, no?

 Okay, I'll stop spamming the thread now, heh.

 Jonathan




Re: How to import data with a different date format

2010-09-08 Thread Jonathan Rochkind
So the standard 'int' field in Solr 1.4 is a trie based field, 
although the example int type in the default solrconfig.xml has a 
precision set to 0, which means it's not really doing trie things. 
If you set the precision to something greater than 0, as in the default 
example tint type, then it's really using 'trie' functionality.  
'trie' functionality speeds up range queries by putting each value into 
'buckets' (my own term), per the precision specified, so solr has to do 
less to grab all values within a certain range.


That's all tint/non-zero-precision-trie does, speed up range queries. 
Your use case involves range queries though, so it's worth 
investigating.  If you use a string or other textual type for sorting or 
range queries, you need to make sure your values sort the way you want 
them to as strings. But -mm-dd will.


More on trie: 
http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/


I think there probably won't be much of a difference at query time 
between non-trie int and string, although I'm not sure, and it may 
depend on the nature of your data and queries.   Using a trie int will 
be faster for (and only for) range queries, if you have a lot of data. 
(There are some cases, depending on the data and the nature of your 
queries, where the overhead of a non-zero-precision trie may outweigh 
the hypothetical gain, but generally it's faster). 

I don't think there should be any appreciable difference between how 
long a non-trie int or a string will take to index -- at least as far as 
solr is concerned, if your app preparing the documents for solr takes 
longer to prepare one than another, that's another story. An actual trie 
(non-zero-precision) theoretically has indexing-time overhead, but I 
doubt it would be noticeable, unless you have a really really lean mean 
indexing setup where ever microsecond counts.


Jonathan

Dennis Gearon wrote:

I'm doing something similar for dates/times/timestamps.

I'm actually trying to do, 'now' is within the range of what 
appointments(date/time from and to combos, i.e. timestamps).

Fairly simple search of:

   What items have a start time BEFORE now, and an end time AFTER now?

My thoughts were to store:
  unix time stamp BIGINTS (64 bit)
  ISO_DATE ISO_TIME strings

Which is going to be faster:
   1/ Indexing?
   2/ Searching?

How does the 'tint' field mentioned below apply?



Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote:

  

From: Jonathan Rochkind rochk...@jhu.edu
Subject: Re: How to import data with a different date format
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Date: Wednesday, September 8, 2010, 10:27 AM
Just throwing it out there, I'd
consider a different approach for an actual real app,
although it might not be easier to get up quickly. (For
quickly, yeah, I'd just store it as a string, more on that
at bottom).

If none of your dates have times, they're all just full
days, I'm not sure you really need the date type at all.

Convert the date to number-of-days since epoch
integer.  (Most languages will have a way to do this,
but I don't know about pure XSLT).  Store _that_ in a
1.4 'int' field.  On top of that, make it a tint
(precision non-zero) for faster range queries.

But now your actual interface will have to convert from
number of days since epoch to a displayable date. (And if
you allow user input, convert the input to
number-of-days-since-epoch before making a range query or
fq, but you'd have to do that anyway even with solr dates,
users aren't going to be entering W3CDate raw, I don't
think).

That is probably the most efficient way to have solr handle
it -- using an actual date field type gives you a lot more
precision than you need, which is going to hurt performance
on range queries. Which you can compensate for with trie
date sure, but if you don't really need that precision to
begin with, why use it?  Also the extra precision can
end up doing unexpected things and making it easier to have
bugs (range queries on that high precision stuff, you need
to make sure your start date has 00:00:00 set and your end
date has 23:59:59 set, to do what you probably expect). If
you aren't going to use the extra precision, makes
everything a lot simpler to not use a date field.

Alternately, for your get this done quick method, yeah,
I'd just store it as a string. With a string exactly as
you've specified, sorting and range queries won't work how
you'd want.  But if you can make it a string of the
format /mm/dd instead (always two-digit month and
year), then you can even sort and do range queries on your
string dates. For the quick and dirty prototype, I'd just do
that.  In fact, while this might make range queries and
sorting _slightly_ slower than if you use an int or a 

Re: How to import data with a different date format

2010-09-08 Thread Dennis Gearon
So now, vs when 'trie' came out, Solr has an INT field that IS 'trie', right?

And nothing date/timestamp related has come out since, making 'trie'/INT the 
field of choice for timestamps, right? 

Seems like the fastest choice.

I will have to read up on it.

Seems like my original choice to use unix timestamp as storage in my SQL 
database, vs native Postgres timestamp, will make everything easier between:
  PHP
  Symfony
  Postgres
  Solr

It's probably going to be a good idea to store two other columns in the search 
index for display, 'date', 'time'. That is, unless I force the user's 
javascript to generate the time and date from the unix timestamp. hmm.
  
Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote:

 From: Jonathan Rochkind rochk...@jhu.edu
 Subject: Re: How to import data with a different date format
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Wednesday, September 8, 2010, 11:35 AM
 So the standard 'int' field in Solr
 1.4 is a trie based field, although the example int type
 in the default solrconfig.xml has a precision set to 0,
 which means it's not really doing trie things. If you set
 the precision to something greater than 0, as in the default
 example tint type, then it's really using 'trie'
 functionality.  'trie' functionality speeds up range
 queries by putting each value into 'buckets' (my own term),
 per the precision specified, so solr has to do less to grab
 all values within a certain range.
 
 That's all tint/non-zero-precision-trie does, speed up
 range queries. Your use case involves range queries though,
 so it's worth investigating.  If you use a string or
 other textual type for sorting or range queries, you need to
 make sure your values sort the way you want them to as
 strings. But -mm-dd will.
 
 More on trie: 
 http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/
 
 I think there probably won't be much of a difference at
 query time between non-trie int and string, although I'm not
 sure, and it may depend on the nature of your data and
 queries.   Using a trie int will be faster
 for (and only for) range queries, if you have a lot of data.
 (There are some cases, depending on the data and the nature
 of your queries, where the overhead of a non-zero-precision
 trie may outweigh the hypothetical gain, but generally it's
 faster). 
 I don't think there should be any appreciable difference
 between how long a non-trie int or a string will take to
 index -- at least as far as solr is concerned, if your app
 preparing the documents for solr takes longer to prepare one
 than another, that's another story. An actual trie
 (non-zero-precision) theoretically has indexing-time
 overhead, but I doubt it would be noticeable, unless you
 have a really really lean mean indexing setup where ever
 microsecond counts.
 
 Jonathan
 
 Dennis Gearon wrote:
  I'm doing something similar for
 dates/times/timestamps.
  
  I'm actually trying to do, 'now' is within the range
 of what appointments(date/time from and to combos, i.e.
 timestamps).
  
  Fairly simple search of:
  
     What items have a start time BEFORE now,
 and an end time AFTER now?
  
  My thoughts were to store:
    unix time stamp BIGINTS (64 bit)
    ISO_DATE ISO_TIME strings
  
  Which is going to be faster:
     1/ Indexing?
     2/ Searching?
  
  How does the 'tint' field mentioned below apply?
  
  
  
  Dennis Gearon
  
  Signature Warning
  
  EARTH has a Right To Life,
    otherwise we all die.
  
  Read 'Hot, Flat, and Crowded'
  Laugh at http://www.yert.com/film.php
  
  
  --- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu
 wrote:
  
    
  From: Jonathan Rochkind rochk...@jhu.edu
  Subject: Re: How to import data with a different
 date format
  To: solr-user@lucene.apache.org
 solr-user@lucene.apache.org
  Date: Wednesday, September 8, 2010, 10:27 AM
  Just throwing it out there, I'd
  consider a different approach for an actual real
 app,
  although it might not be easier to get up quickly.
 (For
  quickly, yeah, I'd just store it as a string, more
 on that
  at bottom).
  
  If none of your dates have times, they're all just
 full
  days, I'm not sure you really need the date type
 at all.
  
  Convert the date to number-of-days since epoch
  integer.  (Most languages will have a way to
 do this,
  but I don't know about pure XSLT).  Store
 _that_ in a
  1.4 'int' field.  On top of that, make it a
 tint
  (precision non-zero) for faster range queries.
  
  But now your actual interface will have to convert
 from
  number of days since epoch to a displayable
 date. (And if
  you allow user input, convert the input to
  number-of-days-since-epoch before making a range
 query or
  fq, but you'd have to do that anyway even with
 

RE: Re: Re: Invariants on a specific fq value

2010-09-08 Thread Chris Hostetter

: Sounds great! I'll be very sure to put it to the test tomorrow and 
: perhaps add documentation on these types to the solrconfigxml wiki page 
: for reference.

SolrConfigXml wouldn't really be an appropriate place to document this 
-- it's not a general config item, it's a feature of the SearchHandler...

   http://wiki.apache.org/solr/SearchHandler

That wiki page already documented defaults, i've updated it to add 
details on appends and invariants.


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



RE: Re: Re: Invariants on a specific fq value

2010-09-08 Thread Markus Jelsma
Excellent! You already made my day for tomorrow! I'll check it's behavior with 
fq parameters specifying the a filter for the same field!
-Original message-
From: Chris Hostetter hossman_luc...@fucit.org
Sent: Wed 08-09-2010 21:04
To: solr-user@lucene.apache.org; 
Subject: RE: Re: Re: Invariants on a specific fq value


: Sounds great! I'll be very sure to put it to the test tomorrow and 
: perhaps add documentation on these types to the solrconfigxml wiki page 
: for reference.

SolrConfigXml wouldn't really be an appropriate place to document this 
-- it's not a general config item, it's a feature of the SearchHandler...

  http://wiki.apache.org/solr/SearchHandler

That wiki page already documented defaults, i've updated it to add 
details on appends and invariants.


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss      ...  Stump The Chump!



Re: How to import data with a different date format

2010-09-08 Thread Chris Hostetter

: If none of your dates have times, they're all just full days, I'm not sure you
: really need the date type at all.
: 
: Convert the date to number-of-days since epoch integer.  (Most languages will
: have a way to do this, but I don't know about pure XSLT).  Store _that_ in a
: 1.4 'int' field.  On top of that, make it a tint (precision non-zero) for
: faster range queries.

There's really no advantage to doing this over using the TrieDateField 
(available in Solr 1.4).  It's esentially how it's implemented under the 
covers (you can pick the precision just like TrieInt) except that:

1) it uses a long instead of an int
2) it supports DateMath expressions
3) it supports Date Faceting

-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Randomly slow response times for range queries

2010-09-08 Thread oleg.gnatovskiy

Hello all,

I am running two range queries on a double value as filter queries using
Solr 1.4, and for the most part am getting great performance (qTime 
100ms). However, at certain QPS, I start getting very slow queries
(2000+ms). I've tried this using the new trie fields, and using standar
sdouble fields, and have had similar results. Is there a known issue with
randomly slow queries when doing range searches with Solr?

Thanks for any support you can offer.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Randomly-slow-response-times-for-range-queries-tp1441724p1441724.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is semicolon a character that needs escaping?

2010-09-08 Thread Chris Hostetter

: I am using 1.3 without a sort param which explains it, I think. It would
: be nice to update to 1.4 but we try to avoid such actions on a
: production server as long as everything runs fine (the semicolon thing
: was only reported recently).

if you don't currenlty use sort at all, then adding a default sort param 
of score desc to your solr config for that handler, you shouldn't have 
to ever worry about semicolons again.

(i'm fairly certainSolr 1.3 supported Defaults - i may be wrong ... you 
might have to add that hardcoded sort param in your client)


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Re: How to use TermsComponent when I need a filter

2010-09-08 Thread Chris Hostetter

: Subject: How to use TermsComponent when I need a filter
: In-Reply-To: 8ffbbf6788bd5842b5a7274ef0f6837e01c3d...@msex85.morningstar.com
: References: 8ffbbf6788bd5842b5a7274ef0f6837e01c3d...@msex85.morningstar.com

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking



-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



How to use TermsComponent when I need a filter

2010-09-08 Thread David Yang
Hi,

 

I have a solr index, which for simplicity is just a list of names, and a
list of associations. (either a multivalue field e.g. {A1, A2, A3, A6}
or a string concatenation list e.g. A1 A2 A3 A6)

 

I want to be able to provide autocomplete but with a specific
association. E.g. Names beginning with Bob in association A5. 

 

Is this possible? I would prefer not to have to have one index per
association, since the number of associations is pretty large

 

Cheers,

 

David 

 



Re: Randomly slow response times for range queries

2010-09-08 Thread Erick Erickson
Well, throw enough queries at any server and it'll slow right down, so
how many are we talking here?

But no, there're no SOLR issues like this that I know of. That said, you
could be getting cache thrashing. You could be getting garbage collection
by the JVM. You could be executing commits somehow (are you
updating?) and causing your caches to be refilled. You could be

The admin/stats.jsp page (also linked from the admin page) can give you
some clues, look particularly for evictions most the way down the page.

Best
Erick

On Wed, Sep 8, 2010 at 3:11 PM, oleg.gnatovskiy crooke...@gmail.com wrote:


 Hello all,

 I am running two range queries on a double value as filter queries using
 Solr 1.4, and for the most part am getting great performance (qTime 
 100ms). However, at certain QPS, I start getting very slow queries
 (2000+ms). I've tried this using the new trie fields, and using standar
 sdouble fields, and have had similar results. Is there a known issue with
 randomly slow queries when doing range searches with Solr?

 Thanks for any support you can offer.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Randomly-slow-response-times-for-range-queries-tp1441724p1441724.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: DataImportHandlerException for custom DIH Transformer

2010-09-08 Thread Vladimir Sutskever
I am experiencing a similar situation?

Any comments?


-Original Message-
From: Shashikant Kore [mailto:shashik...@gmail.com] 
Sent: Wednesday, September 08, 2010 2:54 AM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandlerException for custom DIH Transformer

Resurrecting an old thread.

I faced exact problem as Tommy and the jar was in {solr.home}/lib as Noble
had suggested.

My custom transformer overrides following method as per the specification of
Transformer class.

public Object transformRow(MapString, Object row, Context
context);

But, in the code (EntityProcessorWrapper.java), I see the following line.

  final Method meth = clazz.getMethod(TRANSFORM_ROW, Map.class);

This doesn't match the method signature in Transformer. I think this should
be

  final Method meth = clazz.getMethod(TRANSFORM_ROW, Map.class,
Context.class);

I have verified that adding a method transformRow(MapString, Object row)
works.

Am I missing something?

--shashi

2010/2/8 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 On Mon, Feb 8, 2010 at 9:13 AM, Tommy Chheng tommy.chh...@gmail.com
 wrote:
   I'm having trouble making a custom DIH transformer in solr 1.4.
 
  I compiled the General TrimTransformer into a jar. (just copy/paste
 sample
  code from http://wiki.apache.org/solr/DIHCustomTransformer)
  I placed the jar along with the dataimporthandler jar in solr/lib (same
  directory as the jetty jar)

 do not keep in solr/lib it wont work. keep it in {solr.home}/lib
 
  Then I added to my DIH data-config.xml file:
  transformer=DateFormatTransformer, RegexTransformer,
  com.chheng.dih.transformers.TrimTransformer
 
  Now I get this exception when I try running the import.
  org.apache.solr.handler.dataimport.DataImportHandlerException:
  java.lang.NoSuchMethodException:
  com.chheng.dih.transformers.TrimTransformer.transformRow(java.util.Map)
 at
 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.loadTransformers(EntityProcessorWrapper.java:120)
 
  I noticed the exception lists TrimTransformer.transformRow(java.util.Map)
  but the abstract Transformer class defines a two parameter method:
  transformRow(MapString, Object row, Context context)?
 
 
  --
  Tommy Chheng
  Programmer and UC Irvine Graduate Student
  Twitter @tommychheng
  http://tommy.chheng.com
 
 

 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com


This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  

Re: Implementing synonym NewBie

2010-09-08 Thread Lance Norskog
I believe the synonym filter does not find phrases, only individual words.

It is possible that you could use the Shingle tools to create terms
that are word pairs. This would be very inefficient.

On Tue, Sep 7, 2010 at 6:23 AM, Jak Akdemir jakde...@gmail.com wrote:
 If you think to improve your synonyms file by time I would recommend you
 query time indexing. By the way you don't have to re-index when you need to
 add something more.

 On Sat, Aug 28, 2010 at 10:01 AM, Jonty Rhods jonty.rh...@gmail.com wrote:

 Hi All,

 I want to use synonym for my search.
 Still I am in learning phase of solr. So please help me to implement
 synonym
 in my search.
 according to wiki synonym can be implemented in two ways.
 1 at index time
 2 at search time]

 I have combination 10 of phrase for synonym so which will be better in my
 case.
 something like : live show in new york=live show in clifornia= live show
 = live show in DC = live show in USA
 is synonym will effect my original search?

 thanks
 with regards
 Jonty





-- 
Lance Norskog
goks...@gmail.com


Re: How to retrieve the full corpus

2010-09-08 Thread Lance Norskog
If you want to do a mass scan of an index, the most scalable way is to
make a variation of the Lucene CheckIndex program. Unfortunately,
CheckIndex does not know any of the Solr types.

But first, you should try the above techniques because they are much
much easier.

On Mon, Sep 6, 2010 at 7:59 AM, Markus Jelsma markus.jel...@buyways.nl wrote:
 You can use Luke to inspect a Lucene index. Check the schema browser in your
 Solr admin interface for an example.

 On Monday 06 September 2010 16:52:03 Roland Villemoes wrote:
 Hi All,

 How can I retrieve all words from a Solr core?
 I need a list of all the words and how often they occur in the index.

 med venlig hilsen/best regards

 Roland Villemoes
 Tel: (+45) 22 69 59 62
 E-Mail: mailto:r...@alpha-solutions.dk

 Alpha Solutions A/S
 Borgergade 2, 3.sal, 1300 København K
 Tel: (+45) 70 20 65 38
 Web: http://www.alpha-solutions.dkhttp://www.alpha-solutions.dk/

 ** This message including any attachments may contain confidential and/or
  privileged information intended only for the person or entity to which it
  is addressed. If you are not the intended recipient you should delete this
  message. Any printing, copying, distribution or other use of this message
  is strictly prohibited. If you have received this message in error, please
  notify the sender immediately by telephone, or e-mail and delete all
  copies of this message and any attachments from your system. Thank you.


 Markus Jelsma - Technisch Architect - Buyways BV
 http://www.linkedin.com/in/markus17
 050-8536620 / 06-50258350





-- 
Lance Norskog
goks...@gmail.com


Re: How to import data with a different date format

2010-09-08 Thread Jonathan Rochkind

Solr 1.4 was the first tagged release with trie fields.

And Solr 1.4+ also includes a 'date' field based on 'trie' just for 
dates.  If your dates are actually going to include hour/minute/second, 
not just calendar day-of-month, then I'd definitely use the built in 
solr trie date field, that's what it's for, will do the translation from 
calendar date-time to integer for you (in both directions), and add trie 
buckets for fast range querying too.


I was suggesting that just using 'int' might be simpler if you don't 
need hour/minute/second precision, but are just storing year-month-day. 
If you've got hour-minute-second too, no reason not to use Solr's date 
type, and lots of reasons to do so.


Jonathan

Dennis Gearon wrote:

So now, vs when 'trie' came out, Solr has an INT field that IS 'trie', right?

And nothing date/timestamp related has come out since, making 'trie'/INT the 
field of choice for timestamps, right?

Seems like the fastest choice.

I will have to read up on it.

Seems like my original choice to use unix timestamp as storage in my SQL 
database, vs native Postgres timestamp, will make everything easier between:
  PHP
  Symfony
  Postgres
  Solr

It's probably going to be a good idea to store two other columns in the search 
index for display, 'date', 'time'. That is, unless I force the user's 
javascript to generate the time and date from the unix timestamp. hmm.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote:

  

From: Jonathan Rochkind rochk...@jhu.edu
Subject: Re: How to import data with a different date format
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Date: Wednesday, September 8, 2010, 11:35 AM
So the standard 'int' field in Solr
1.4 is a trie based field, although the example int type
in the default solrconfig.xml has a precision set to 0,
which means it's not really doing trie things. If you set
the precision to something greater than 0, as in the default
example tint type, then it's really using 'trie'
functionality.  'trie' functionality speeds up range
queries by putting each value into 'buckets' (my own term),
per the precision specified, so solr has to do less to grab
all values within a certain range.

That's all tint/non-zero-precision-trie does, speed up
range queries. Your use case involves range queries though,
so it's worth investigating.  If you use a string or
other textual type for sorting or range queries, you need to
make sure your values sort the way you want them to as
strings. But -mm-dd will.

More on trie: 
http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/

I think there probably won't be much of a difference at
query time between non-trie int and string, although I'm not
sure, and it may depend on the nature of your data and
queries.   Using a trie int will be faster
for (and only for) range queries, if you have a lot of data.
(There are some cases, depending on the data and the nature
of your queries, where the overhead of a non-zero-precision
trie may outweigh the hypothetical gain, but generally it's
faster).
I don't think there should be any appreciable difference
between how long a non-trie int or a string will take to
index -- at least as far as solr is concerned, if your app
preparing the documents for solr takes longer to prepare one
than another, that's another story. An actual trie
(non-zero-precision) theoretically has indexing-time
overhead, but I doubt it would be noticeable, unless you
have a really really lean mean indexing setup where ever
microsecond counts.

Jonathan

Dennis Gearon wrote:


I'm doing something similar for
  

dates/times/timestamps.


I'm actually trying to do, 'now' is within the range
  

of what appointments(date/time from and to combos, i.e.
timestamps).


Fairly simple search of:

   What items have a start time BEFORE now,
  

and an end time AFTER now?


My thoughts were to store:
  unix time stamp BIGINTS (64 bit)
  ISO_DATE ISO_TIME strings

Which is going to be faster:
   1/ Indexing?
   2/ Searching?

How does the 'tint' field mentioned below apply?



Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu
  

wrote:

  

From: Jonathan Rochkind rochk...@jhu.edu
Subject: Re: How to import data with a different


date format


To: solr-user@lucene.apache.org


solr-user@lucene.apache.org


Date: Wednesday, September 8, 2010, 10:27 AM
Just throwing it out there, I'd
consider a different approach for an actual real


app,


although it might not be easier to get up quickly.


(For


quickly, yeah, 

Delta Import with something other than Date

2010-09-08 Thread David Yang
Hi,

I have a table that I want to index, and the table has no datetime
stamp. However, the table is append only so the primary key can only go
up. Is it possible to store the last primary key, and use some delta
query=select id where id${last_id_value}

Cheers,

David



Re: Null Pointer Exception with shardsfacets where some shards have no values for some facets.

2010-09-08 Thread Ron Mayer
Yonik Seeley wrote:
 On Tue, Sep 7, 2010 at 8:31 PM, Ron Mayer r...@0ape.com wrote:
 Short summary:
  * Mixing Facets and Shards give me a NullPointerException
when not all docs have all facets.
 
 https://issues.apache.org/jira/browse/SOLR-2110
 
 I believe the underlying real issue stemmed from your use of a complex
 key involvement/race_facet.

Thanks!Yes - that looks like the actual reason, rather than what
I was guessing. I spent a while this morning trying to reproduce the
problem with a simpler example, and wasn't able to - probably because
I overlooked that part.


I see changes have been made (based on comments in) SOLR-2110 and
SOLR-2111, so I'll try with the current trunk..
 [trying now with trunk as of a few minutes ago]
Looking much better.

I'm seeing this in the log files:
SEVERE: Exception during facet.field of 
{!terms=$involvement/gender_facet__terms}involvement/gender_facet:org.a
pache.lucene.queryParser.ParseException: Expected identifier at pos 20 
str='{!terms=$involvement/gender_facet__
terms}involvement/gender_facet'
at 
org.apache.solr.search.QueryParsing$StrParser.getId(QueryParsing.java:718)
at 
org.apache.solr.search.QueryParsing.parseLocalParams(QueryParsing.java:165)
...
but at least I'm getting results, and results that look right for both the body 
of
the document and for most of the facets.

Perhaps next thing I try will be simplifying my keys for my own sanity as much
as for solr's.


Re: Delta Import with something other than Date

2010-09-08 Thread Jonathan Rochkind
Of course you can store whatever you want in a solr index. And if you 
store an integer as a Solr 1.4 int type, you can certainly query for 
all documents that have greater than some specified integer in a field.


You can't use SQL to query Solr though.

I'm not sure what you're really asking?

Jonathan

David Yang wrote:

Hi,

I have a table that I want to index, and the table has no datetime
stamp. However, the table is append only so the primary key can only go
up. Is it possible to store the last primary key, and use some delta
query=select id where id${last_id_value}

Cheers,

David


  


RE: Delta Import with something other than Date

2010-09-08 Thread David Yang
Currently DIH delta import uses the SQL query of type select id from
item where last_modified  ${dataimporter.last_index_time}
What I need is some field like ${dataimporter.last_primary_key}
wiki.apache.org/solr/DataImportHandler
I am thinking of storing the last primary key externally and calling the
delta-import with a parameter and using
${dataimporter.request.last_primary_key} but that seems like a very
brittle approach

Cheers,
David

-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
Sent: Wednesday, September 08, 2010 6:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Delta Import with something other than Date

Of course you can store whatever you want in a solr index. And if you 
store an integer as a Solr 1.4 int type, you can certainly query for 
all documents that have greater than some specified integer in a field.

You can't use SQL to query Solr though.

I'm not sure what you're really asking?

Jonathan

David Yang wrote:
 Hi,

 I have a table that I want to index, and the table has no datetime
 stamp. However, the table is append only so the primary key can only
go
 up. Is it possible to store the last primary key, and use some delta
 query=select id where id${last_id_value}

 Cheers,

 David


   


Need Advice for Finding Freelance Solr Expert

2010-09-08 Thread John Roberts
Hi,

We need someone who knows Solr to help us prepare and index some data. Any
advice on where to find people who know Solr?

Thanks,
John



Re: Randomly slow response times for range queries

2010-09-08 Thread oleg.gnatovskiy

Well I am only sending about 50 QPS at it at the time that it temporarily
slows down, and then it's able to get all the way up to 100 QPS+ with no
problems (until the next random queries). I suppose it could be the garbage
collection. Is there a good way to limit this?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Randomly-slow-response-times-for-range-queries-tp1441724p1443086.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Randomly slow response times for range queries

2010-09-08 Thread oleg.gnatovskiy

Also, does anyone know the best precisionStep to use on a trie field (float)
definition to achieve optimal performance?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Randomly-slow-response-times-for-range-queries-tp1441724p1443096.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need Advice for Finding Freelance Solr Expert

2010-09-08 Thread Dennis Gearon
There's a page on the Solr/Lucene site for this.

I myself will be in the market for one late this year.
Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/8/10, John Roberts jsro...@hotmail.com wrote:

 From: John Roberts jsro...@hotmail.com
 Subject: Need Advice for Finding Freelance Solr Expert
 To: solr-user@lucene.apache.org
 Date: Wednesday, September 8, 2010, 3:50 PM
 Hi,
 
 We need someone who knows Solr to help us prepare and index
 some data. Any
 advice on where to find people who know Solr?
 
 Thanks,
 John
 
 


Re: How to import data with a different date format

2010-09-08 Thread Dennis Gearon
I already have the issue of how to store between different databases, 
languages, platforms, and frameworks.

Settling on LONGINT/unix timestamp solves the problem on all fronts.

I may even send them to the browser and have the JScript convert them to 
date/times (maybe ;-)

So, it's *nix timestamp or bust!

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote:

 From: Jonathan Rochkind rochk...@jhu.edu
 Subject: Re: How to import data with a different date format
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Wednesday, September 8, 2010, 3:07 PM
 Solr 1.4 was the first tagged release
 with trie fields.
 
 And Solr 1.4+ also includes a 'date' field based on 'trie'
 just for 
 dates.  If your dates are actually going to include
 hour/minute/second, 
 not just calendar day-of-month, then I'd definitely use the
 built in 
 solr trie date field, that's what it's for, will do the
 translation from 
 calendar date-time to integer for you (in both directions),
 and add trie 
 buckets for fast range querying too.
 
 I was suggesting that just using 'int' might be simpler if
 you don't 
 need hour/minute/second precision, but are just storing
 year-month-day. 
 If you've got hour-minute-second too, no reason not to use
 Solr's date 
 type, and lots of reasons to do so.
 
 Jonathan
 
 Dennis Gearon wrote:
  So now, vs when 'trie' came out, Solr has an INT field
 that IS 'trie', right?
 
  And nothing date/timestamp related has come out since,
 making 'trie'/INT the field of choice for timestamps,
 right?
 
  Seems like the fastest choice.
 
  I will have to read up on it.
 
  Seems like my original choice to use unix timestamp as
 storage in my SQL database, vs native Postgres timestamp,
 will make everything easier between:
    PHP
    Symfony
    Postgres
    Solr
 
  It's probably going to be a good idea to store two
 other columns in the search index for display, 'date',
 'time'. That is, unless I force the user's javascript to
 generate the time and date from the unix timestamp.
 hmm.
 
  Dennis Gearon
 
  Signature Warning
  
  EARTH has a Right To Life,
    otherwise we all die.
 
  Read 'Hot, Flat, and Crowded'
  Laugh at http://www.yert.com/film.php
 
 
  --- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu
 wrote:
 
    
  From: Jonathan Rochkind rochk...@jhu.edu
  Subject: Re: How to import data with a different
 date format
  To: solr-user@lucene.apache.org
 solr-user@lucene.apache.org
  Date: Wednesday, September 8, 2010, 11:35 AM
  So the standard 'int' field in Solr
  1.4 is a trie based field, although the example
 int type
  in the default solrconfig.xml has a precision
 set to 0,
  which means it's not really doing trie things.
 If you set
  the precision to something greater than 0, as in
 the default
  example tint type, then it's really using
 'trie'
  functionality.  'trie' functionality speeds
 up range
  queries by putting each value into 'buckets' (my
 own term),
  per the precision specified, so solr has to do
 less to grab
  all values within a certain range.
 
  That's all tint/non-zero-precision-trie does,
 speed up
  range queries. Your use case involves range
 queries though,
  so it's worth investigating.  If you use a
 string or
  other textual type for sorting or range queries,
 you need to
  make sure your values sort the way you want them
 to as
  strings. But -mm-dd will.
 
  More on trie: 
  http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/
 
  I think there probably won't be much of a
 difference at
  query time between non-trie int and string,
 although I'm not
  sure, and it may depend on the nature of your data
 and
  queries.   Using a trie int will be
 faster
  for (and only for) range queries, if you have a
 lot of data.
  (There are some cases, depending on the data and
 the nature
  of your queries, where the overhead of a
 non-zero-precision
  trie may outweigh the hypothetical gain, but
 generally it's
  faster).
  I don't think there should be any appreciable
 difference
  between how long a non-trie int or a string will
 take to
  index -- at least as far as solr is concerned, if
 your app
  preparing the documents for solr takes longer to
 prepare one
  than another, that's another story. An actual
 trie
  (non-zero-precision) theoretically has
 indexing-time
  overhead, but I doubt it would be noticeable,
 unless you
  have a really really lean mean indexing setup
 where ever
  microsecond counts.
 
  Jonathan
 
  Dennis Gearon wrote:
      
  I'm doing something similar for
        
  dates/times/timestamps.
      
  I'm actually trying to do, 'now' is within
 the range
        
  of what appointments(date/time from and to combos,
 i.e.
  timestamps).
      
  Fairly simple search of:
 
   

Re: Null Pointer Exception with shardsfacets where some shards have no values for some facets.

2010-09-08 Thread Yonik Seeley
I just checked in the last part of those changes that should eliminate
any restriction on key.
But, that last part dealt with escaping keys that contained whitespace or }
Your example really should have worked after my previous 2 commits.
Perhaps not all of the servers got successfully upgraded?
Can you try trunk again now?

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8

On Wed, Sep 8, 2010 at 6:28 PM, Ron Mayer r...@0ape.com wrote:
 Yonik Seeley wrote:
 On Tue, Sep 7, 2010 at 8:31 PM, Ron Mayer r...@0ape.com wrote:
 Short summary:
  * Mixing Facets and Shards give me a NullPointerException
    when not all docs have all facets.

 https://issues.apache.org/jira/browse/SOLR-2110

 I believe the underlying real issue stemmed from your use of a complex
 key involvement/race_facet.

 Thanks!    Yes - that looks like the actual reason, rather than what
 I was guessing. I spent a while this morning trying to reproduce the
 problem with a simpler example, and wasn't able to - probably because
 I overlooked that part.


 I see changes have been made (based on comments in) SOLR-2110 and
 SOLR-2111, so I'll try with the current trunk..
     [trying now with trunk as of a few minutes ago]
 Looking much better.

 I'm seeing this in the log files:
 SEVERE: Exception during facet.field of 
 {!terms=$involvement/gender_facet__terms}involvement/gender_facet:org.a
 pache.lucene.queryParser.ParseException: Expected identifier at pos 20 
 str='{!terms=$involvement/gender_facet__
 terms}involvement/gender_facet'
        at 
 org.apache.solr.search.QueryParsing$StrParser.getId(QueryParsing.java:718)
        at 
 org.apache.solr.search.QueryParsing.parseLocalParams(QueryParsing.java:165)
        ...
 but at least I'm getting results, and results that look right for both the 
 body of
 the document and for most of the facets.

 Perhaps next thing I try will be simplifying my keys for my own sanity as much
 as for solr's.



Re: Null Pointer Exception with shardsfacets where some shards have no values for some facets.

2010-09-08 Thread Ron Mayer
Yonik Seeley wrote:
 I just checked in the last part of those changes that should eliminate
 any restriction on key.
 But, that last part dealt with escaping keys that contained whitespace or }
 Your example really should have worked after my previous 2 commits.
 Perhaps not all of the servers got successfully upgraded?

Yes, quite possible.

 Can you try trunk again now?

Will check sometime tomorrow.


Re: Creating a sub-index from another

2010-09-08 Thread Chris Hostetter

: I have a Solr Index with several million documents. I need to implement some
: text mining processes and I would like to create a million documents index
: from the original for some tests.

Which million documents do you want?

If you're just looking for a one time kind of experimental test index...

1) take a snapshot of your live index
2) copy it onto your dev machine
3) load it into Solr
4) execute delete commands (acording to some criteria you choose) until 
you only have 1 million documents left.


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Re: Solr Highlighting Question

2010-09-08 Thread Jed Glazner




Anybody?

On 09/08/2010 11:26 AM, Jed Glazner wrote:

  Thanks for taking time to read through this.  I'm using a checkout from

the solr 3.x branch

My problem is with the highlighter and wildcards

I can get the highlighter to work with wild cards just fine, the problem
is that  solr is returning the term matched, when what I want it to do
is highlight the chars in the term that were matched.


Example:

http://192.168.1.75:8983/solr/music/select?indent=onq=name_title:wel*qt=beyondhl=truehl.fl=name_titlef.name_title.hl.usePhraseHighlighter=truef.name_title.hl.highlightMultiTerm=true

The results that come back look like this:

emWelcome/em   to the Jungle

What I want them to look like is this:
emWel/emcome to the Jungle

   From what I gathered by searching the archives is that solr 1.1 used to
do this... Is there a way to get that functionality?

Thanks!

  



-- 

This email and its attachments (if any) are for the sole use of the
intended recipient, and may contain private, confidential, and
privileged material. Any review, copying, or distribution of this
email, its attachments or the information contained herein is strictly
prohibited. If you are not the intended recipient, please contact the
sender immediately and permanently delete the original and any copies
of this email and any attachments.






[ANN] Webinar, Sep 15: Mastering the Power of Faceted Search

2010-09-08 Thread Yonik Seeley
Folks, here's an upcoming Solr webinar sponsored by my employer.
It's Hoss on faceting, so it should be good!

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8

--- Webinar Details

Join us for a free webcast

Mastering the Power of Faceted Search
with Chris Hostetter

Wednesday, September 15, 2010
9:00 AM PST / 12:00 PM EST / 17:00 GMT

Click here to sign up
http://www.eventsvc.com/lucidimagination/event/f5d87726f8ab4ed4911aad605f94f455?trk=AP


Few search features have contributed as much to findability and user
search experience as Facets. By organizing and classifying underlying
information into an intuitive method for filtering information,
faceted searching gives users a powerful tool for navigation and
discovery.

Once the province of costly proprietary commercial systems, the
Faceted Searching capabilities of the Apache Solr Open Source Search
have led developers around the world to build this popular feature
into their search apps. Yet many of its more powerful features are not
as widely known and used, and offer yet more powerful improvements to
the search experience.

Join Apache Lucene/Solr committer Chris Hostetter of Lucid Imagination
for an in depth technical workshop on the what, why and how of
faceting with Solr, the Lucene Search Server. This presentation will
cover:
* the different types of facets that Solr supports
* techniques for dealing with complex faceting use cases
* performance factors to be aware of
* new faceting features on the horizon


About the presenter:
Chris Hossman Hostetter is a Member of the Apache Software
Foundation, and serves on the Lucene Project Management Committee.
Prior to joining Lucid Imagination in 2010 to work full time on Solr
development, he spent 11 years as Principal Software Engineer for CNET
Networks, thinking about searching structured data that was never as
structured as it should have been.


Re: Multi core schema file

2010-09-08 Thread Lance Norskog
A demonstration of this feature would be a good addition to the
example/multicore directory.

On Wed, Sep 8, 2010 at 3:45 AM, Grijesh.singh pintu.grij...@gmail.com wrote:

 solr.xml allows you to mention the other properties as well like
 instanceDir, config,schema in the cores/core tag

 So , sharing the entire conf dir may not be possible , but it is
 possible to share solrconfig.xml  and schema.xml

 U can see the detail parameters at wiki page
 http://wiki.apache.org/solr/CoreAdmin

 -
 Grijesh
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Multi-core-schema-file-tp1438460p1438720.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Lance Norskog
goks...@gmail.com


Re: Query result ranking - Score independent

2010-09-08 Thread Lance Norskog
Generally speaking it is a bad idea to change the schema without
reindexing. I found several little things that could go wrong back
when I had a huge index and could not reindex.

On Wed, Sep 8, 2010 at 4:58 AM, Erick Erickson erickerick...@gmail.com wrote:
 Ooops, hit send too quickly. Could you show us the entire URL you send
 that produces the error?

 Erick

 On Wed, Sep 8, 2010 at 7:58 AM, Erick Erickson erickerick...@gmail.comwrote:

 The change in the schema shouldn't matter (emphasis on the should).

 What version of SOLR are you using? I tried this query and it works just
 fine for me, I'm using 1.4.1

 Best
 Erick


 On Wed, Sep 8, 2010 at 4:38 AM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:

 My request was very simple:
 q= astronomy^0
 And Solr returned the exception.
 Maybe the zero boost factor is not causing the exception?

 1) We indexed n documents with a Schema.xml.
 2)Then we changed some field type in the Schema.xml
 3)Then we indexed other m documents

 Maybe this could cause the exception?



 2010/9/7 Grant Ingersoll gsing...@apache.org

 
  On Sep 7, 2010, at 7:08 AM, Alessandro Benedetti wrote:
 
   Hi all,
   I need to retrieve query-results with a ranking independent from each
   query-result's default lucene score, which means assigning the same
 score
  to
   each query result.
   I tried to use a zero boost factor ( ^0 ) to reset to zero each
   query-result's score.
   This strategy seems to work within the example solr instance, but in
 my
   Solr instance, using a zero boost factor causes a Buffer Exception
   (
   HTTP Status 500 - null java.lang.IllegalArgumentException at
   java.nio.Buffer.limit(Buffer.java:249) at
  
 
 org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:123)
   at
  
 
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157)
   at
  
 
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
   at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:70) at
   org.apache.lucene.store.IndexInput.readLong(IndexInput.java:93) at
   org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:210) at
   org.apache.lucene.index.SegmentReader.document(SegmentReader.java:948)
 at
  
 
 org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:506)
   at org.apache.lucene.index.IndexReader.document(IndexReader.java:947)
   )
 
  Hmm, that stack trace doesn't align w/ the boost factor.  What  was your
  request?  I think there might be something else wrong here.
 
   Do you know any other technique to reset to some fixed constant value,
  all
   the query-result's scores?
   Each query result should obtain the same score.
   Any suggestion?
 
 
  The ConstantScoreQuery or a Filter should do this.  You could do
 something
  like:
 
  q=*:*fq=the real query, as in q=*:*fq=field:foo
 
  -Grant
 
 
  --
  Grant Ingersoll
  http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct
 7-8
 
 


 --
 --

 Benedetti Alessandro
 Personal Page: http://tigerbolt.altervista.org

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England







-- 
Lance Norskog
goks...@gmail.com


Re: Solr Highlighting Question

2010-09-08 Thread Koji Sekiguchi

 (10/09/09 2:26), Jed Glazner wrote:

Thanks for taking time to read through this.  I'm using a checkout from

the solr 3.x branch

My problem is with the highlighter and wildcards

I can get the highlighter to work with wild cards just fine, the problem
is that  solr is returning the term matched, when what I want it to do
is highlight the chars in the term that were matched.


Example:

http://192.168.1.75:8983/solr/music/select?indent=onq=name_title:wel*qt=beyondhl=truehl.fl=name_titlef.name_title.hl.usePhraseHighlighter=truef.name_title.hl.highlightMultiTerm=true 



The results that come back look like this:

emWelcome/em   to the Jungle

What I want them to look like is this:
emWel/emcome to the Jungle

  From what I gathered by searching the archives is that solr 1.1 used to
do this... Is there a way to get that functionality?

Thanks!



Try to use FastVectorHighlighter on n-gram field for highlighting problem...
But FVH cannot process wildcard query. So you should query wel instead of
wel*. Then this makes you got unwanted hit like voemwel/em.
I don't think there is a solution for both of them with OOTB today.

There is a JIRA issue, but no patches there:

https://issues.apache.org/jira/browse/SOLR-1926

Koji

--
http://www.rondhuit.com/en/



Re: Solr, c/s type ?

2010-09-08 Thread Jason, Kim

I'd just like to use solr for in-house which is not web application.
But I don't know how should i do?
Thanks,

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-c-s-type-tp1392952p1444175.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Delta Import with something other than Date

2010-09-08 Thread Lance Norskog

https://issues.apache.org/jira/browse/SOLR-1499

This is a patch (not committed) that queries a Solr instance and returns 
the values as a DIH document. This allows you to do a sort query to 
Solr, ask for the first result, and continue indexing after that. Scary, 
but it works.


Lance

David Yang wrote:

Currently DIH delta import uses the SQL query of type select id from
item where last_modified  ${dataimporter.last_index_time}
What I need is some field like ${dataimporter.last_primary_key}
wiki.apache.org/solr/DataImportHandler
I am thinking of storing the last primary key externally and calling the
delta-import with a parameter and using
${dataimporter.request.last_primary_key} but that seems like a very
brittle approach

Cheers,
David

-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
Sent: Wednesday, September 08, 2010 6:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Delta Import with something other than Date

Of course you can store whatever you want in a solr index. And if you
store an integer as a Solr 1.4 int type, you can certainly query for
all documents that have greater than some specified integer in a field.

You can't use SQL to query Solr though.

I'm not sure what you're really asking?

Jonathan

David Yang wrote:
   

Hi,

I have a table that I want to index, and the table has no datetime
stamp. However, the table is append only so the primary key can only
 

go
   

up. Is it possible to store the last primary key, and use some delta
query=select id where id${last_id_value}

Cheers,

David



 


Re: Solr, c/s type ?

2010-09-08 Thread Dennis Gearon
You would set up a Java server (container) and run Solr/Lucene. Not sure how to 
do the following, but then you block the standard port for Solr/Lucene on that 
machine from being accessible except locally. 

In whatever code/applicaiton that you are working with, on that machine, you 
then use it's libraries to access 'the web', but only actually the 'localhost' 
127.0.0.1, usually, @ the port for Solr/Lucene.

Learn, learn, learn, and study some more about using/modifiying data importers, 
indexes, putting in filters, stemmmers,shinglers, carpenters (joke), blah, 
blah, blah, and last but not least, the almight QUERY to access the index, 
filters, etc.

Then you will have a local search engine on whatever data you had put into it.

There is also the 'embedded server' which I have only heard about. Anybody else 
on this list other than me is much more experienced in general, and can advise 
you better on those.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/8/10, Jason, Kim hialo...@gmail.com wrote:

 From: Jason, Kim hialo...@gmail.com
 Subject: Re: Solr, c/s type ?
 To: solr-user@lucene.apache.org
 Date: Wednesday, September 8, 2010, 9:32 PM
 
 I'd just like to use solr for in-house which is not web
 application.
 But I don't know how should i do?
 Thanks,
 
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-c-s-type-tp1392952p1444175.html
 Sent from the Solr - User mailing list archive at
 Nabble.com.
 


RE: Solr, c/s type ?

2010-09-08 Thread Jonathan Rochkind
You _could_ use SolrJ with EmbeddedSolrServer.  But personally I wouldn't 
unless there's a reason to.  There's no automatic reason not to use the 
ordinary Solr HTTP api, even for an in-house application which is not a web 
application.  Unless you have a real reason to use embedded solr, I'd use the 
HTTP api, possibly via SolrJ if your local application is Java. 

http://wiki.apache.org/solr/Solrj

In my (very limited, so if someone else knows better and has something to say, 
listen to them) experience, using EmbeddedSolrServer ends up biting you down 
the line, it doesn't work _quite_ like ordinary/typical Solr, and some things 
end up not working. And you're going to be mostly on your own for 
scaling/concurrency issues. Why re-invent the wheel when ordinary HTTP solr 
already works so well?  But EmbeddedSolrServer is there, if you actually have a 
need for it.  But there's no reason you can't use Solr's HTTP api for a non-web 
application, the fact that your application talks to Solr over HTTP does not 
mean your application has to talk to it's users over HTTP, two different 
things. 

Incidentally, using EmbeddedSolrServer would in fact _not_ be a client/server 
setup between your app and solr, per your original question. HTTP is a 
client/server protocol, using the ordinary Solr HTTP api is the way to set up a 
client/server relationship between your app and Solr. 

Jonathan

From: Jason, Kim [hialo...@gmail.com]
Sent: Thursday, September 09, 2010 12:32 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr, c/s type ?

I'd just like to use solr for in-house which is not web application.
But I don't know how should i do?
Thanks,

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-c-s-type-tp1392952p1444175.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Distance sorting with spatial filtering

2010-09-08 Thread Lance Norskog
It says that the field sum(1) is not indexed. You don't have a field 
called 'sum(1)'. I know there has been a lot of changes in query 
parsing, and sorting by functions may be on the list. But the _val_ 
trick is the older one and, and you noted, still works. The _val_ trick 
sets the ranking value to the output of the function, thus indirectly 
doing what sort= does.


Lance

Scott K wrote:

I get the error on all functions.
GET 'http://localhost:8983/solr/select?q=*:*sort=sum(1)+asc'
Error 400 can not sort on unindexed field: sum(1)

I tried another nightly build from today, Sep 7th, with the same
results. I attached the schema.xml

Thanks for the help!
Scott

On Wed, Sep 1, 2010 at 18:43, Lance Norskoggoks...@gmail.com  wrote:
   

Post your schema.

On Mon, Aug 30, 2010 at 2:04 PM, Scott Ks...@skister.com  wrote:
 

The new spatial filtering (SOLR-1586) works great and is much faster
than fq={!frange. However, I am having problems sorting by distance.
If I try
GET 
'http://localhost:8983/solr/select/?q=*:*sort=dist(2,latitude,longitude,0,0)+asc'
I get an error:
Error 400 can not sort on unindexed field: dist(2,latitude,longitude,0,0)

I was able to work around this with
GET 'http://localhost:8983/solr/select/?q=*:* AND _val_:recip(dist(2,
latitude, longitude, 0,0),1,1,1)fl=*,score'

But why isn't sorting by functions working? I get this error with any
function I try to sort on.This is a nightly trunk build from Aug 25th.
I see SOLR-1297 was reopened, but that seems to be for edge cases.

Second question: I am using the LatLonType from the Spatial Filtering
wiki, http://wiki.apache.org/solr/SpatialSearch
Are there any distance sorting functions that use this field, or do I
need to have three indexed fields, store_lat_lon, latitude, and
longitude, if I want both filtering and sorting by distance.

Thanks, Scott

   



--
Lance Norskog
goks...@gmail.com

 


Re: Delta Import with something other than Date

2010-09-08 Thread Lukas Kahwe Smith

On 09.09.2010, at 00:44, David Yang wrote:

 Currently DIH delta import uses the SQL query of type select id from
 item where last_modified  ${dataimporter.last_index_time}
 What I need is some field like ${dataimporter.last_primary_key}
 wiki.apache.org/solr/DataImportHandler
 I am thinking of storing the last primary key externally and calling the
 delta-import with a parameter and using
 ${dataimporter.request.last_primary_key} but that seems like a very
 brittle approach


i am also using request parameters in my DIH import. we are not yet in 
production but in our tests it worked fine.

regards,
Lukas Kahwe Smith
m...@pooteeweet.org