can solr admin tab statistics be customized... how can this be achived.

2012-07-19 Thread yayati


Hi, 

I want to compute my own stats in addition to solr default stats. How can i
enhance statistics in solr? How this thing can be achieved.. Solr compute
stats as cumulative, is there is any way to get per instant stats...??

Thanks... waiting for good replies..





--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-solr-admin-tab-statistics-be-customized-how-can-this-be-achived-tp3996128.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: help: I always get NULL with row.get(columnName)

2012-07-19 Thread Roy Liu
anyone knows?

On Thu, Jul 19, 2012 at 5:48 PM, Roy Liu  wrote:

> Hi,
>
> When I use Transformer to handle files, I always get NULL with
> row.get(columnName).
> anyone knows?
>
> --
> The following file is *data-config.xml*
>
> 
>  name="ds"
>   driver="oracle.jdbc.driver.OracleDriver"
>   url="jdbc:oracle:thin:@10.1.1.1:1521:sid"
>   user="username"
>   password="pwd"
>   />
>   
>
> query="select a.objid as ID from DOCGENERAL a where
> a.objid=14154965">
>
> 
>
> * *query="select docid as ID, name as filename,
> storepath as filepath from attachment where docid=${report.ID}" *
> * transformer="com.bs.solr.BSFileTransformer" >*
> * *
> * *
> * *
> * *
>
> 
>
>   
> 
>
>
> public class *BSFileTransformer *extends Transformer {
>  private static Log LOGGER = LogFactory.getLog(BSFileTransformer.class);
>  @Override
>  public Object transformRow(Map row, Context context) {
> // row.get("filename") is always null,but row.get("id") is
> OK.
>  S*ystem.out.println("==filename:"+row.get("filename"));*
>
> List> fields = context.getAllEntityFields();
>
> String id = null; // Entity ID
> String fileName = "NONAME";
>  for (Map field : fields) {
> String name = field.get("name");
>  System.out.println("name:" + name);
> if ("bs_attachment_id".equals(name)) {
>  String columnName = field.get("column");
> id = String.valueOf(row.get(columnName));
>  }
> if ("bs_attachment_name".equals(name)) {
> String columnName = field.get("column");
>  fileName = (String) row.get(columnName);
> }
>  String isFile = field.get("isfile");
> if ("true".equals(isFile)) {
>  String columnName = field.get("column");
> String filePath = (String) row.get(columnName);
>
> try {
> System.out.println("fileName:"+ fileName+",filePath: " + filePath);
>  if(filePath != null){
> File file = new File(filePath);
>  InputStream inputStream = new FileInputStream(file);
> Tika tika = new Tika();
>  String text = tika.parseToString(inputStream);
>  row.put(columnName, text);
>  }
> LOGGER.info("Processed File OK! Entity: " + fileName + ", ID: " +id);
>  } catch (IOException ioe) {
> LOGGER.error(ioe.getMessage());
> row.put(columnName, "");
>  } catch (TikaException e) {
> LOGGER.error("Parse File Error:" + id + ", Error:"
>  + e.getMessage());
> row.put(columnName, "");
> }
>  }
> }
> return row;
>  }
> }
>


Re: How to setup SimpleFSDirectoryFactory

2012-07-19 Thread Bill Bell
Thanks. Are you saying that if we run low on memory, the MMapDirectory will 
stop using it? The least used memory will be removed from the OS automatically? 
Isee some paging. Wouldn't paging slow down the querying?

My index is 10gb and every 8 hours we get most of it in shared memory. The 
memory is 99 percent used, and that does not leave any room for other apps. 

Other implications?

Sent from my mobile device
720-256-8076

On Jul 19, 2012, at 9:49 AM, "Uwe Schindler"  wrote:

> Read this, then you will see that MMapDirectory will use 0% of your Java Heap 
> space or free system RAM:
> 
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> 
> Uwe
> 
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
> 
> 
>> -Original Message-
>> From: William Bell [mailto:billnb...@gmail.com]
>> Sent: Tuesday, July 17, 2012 6:05 AM
>> Subject: How to setup SimpleFSDirectoryFactory
>> 
>> We all know that MMapDirectory is fastest. However we cannot always use it
>> since you might run out of memory on large indexes right?
>> 
>> Here is how I got iSimpleFSDirectoryFactory to work. Just set -
>> Dsolr.directoryFactory=solr.SimpleFSDirectoryFactory.
>> 
>> Your solrconfig.xml:
>> 
>> > class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
>> 
>> You can check it with http://localhost:8983/solr/admin/stats.jsp
>> 
>> Notice that the default for Windows 64bit is MMapDirectory. Else
>> NIOFSDirectory except for WIndows It would be nicer if we just set it 
>> all up
>> with a helper in solrconfig.xml...
>> 
>> if (Constants.WINDOWS) {
>> if (MMapDirectory.UNMAP_SUPPORTED && Constants.JRE_IS_64BIT)
>>return new MMapDirectory(path, lockFactory);
>> else
>>return new SimpleFSDirectory(path, lockFactory);
>> } else {
>>return new NIOFSDirectory(path, lockFactory);
>>  }
>> }
>> 
>> 
>> 
>> --
>> Bill Bell
>> billnb...@gmail.com
>> cell 720-256-8076
> 
> 


custom sorter

2012-07-19 Thread Siping Liu
Hi,
I have requirements to place a document to a pre-determined  position for
special filter query values, for instance when filter query is
fq=(field1:"xyz") place document abc as first result (the rest of the
result set will be ordered by sort=field2). I guess I have to plug in my
Java code as a custom sorter. I'd appreciate it if someone can shed light
on this (how to add custom sorter, etc.)
TIA.


Re: queryResultCache not checked with fieldCollapsing

2012-07-19 Thread Chris Hostetter

: When I run dismax queries I see there are no lookups in the
: queryResultCache.  If I remove the field collapsing - lookups happen.  I
: can't find any mention of this anywhere or think of reason why this should

I'm not very familiar with the grouping code, but i think the 
crux of what you are seeing is that when you using grouping, queryResultCache 
isn't used 
because it can't be.   queryResultCache is a mapping of 
 ->  but with grouping you 
don't have a simple DocList any more, so there is nothing that can go in 
(or come out of) the cache.

There are probably oportunities for other things to be cached when 
grouping is used (using new SolrCaches) but i'm not sure what/how.

: disable caching.  I've tried playing with the group.cache.percent parameter

group.cache.percent is ... something different.  I don't remember how 
exactly it works (mvg: ping?), but it definitely doesn't affect any usage 
of the queryResultCache.

If your main concern is caching entire requests (ie: query options, 
facets, filters, sort, grouping, etc...) then i would suggest you consider 
putting an HTTP cache in front of Solr.


-Hoss


Redirecting SolrQueryRequests to another core with Handler

2012-07-19 Thread Nicholas Ball

What is the best way to redirect a SolrQueryRequest to another core from
within a handler (custom SearchHandler)?

I've tried to find the SolrCore of the core I want to redirect to and
called the execute() method with the same params but it looks like the
SolrQueryRequest object already has the old core name embedded into it! I
want to do this without making a new request and going through the servlet
etc...

* Note that I had to have an empty core with a special name just to do
this redirection process in the first place, if there is a better way to
proceed with this please let me know too :)

Many thanks for any help you can give,
Nicholas (incunix)


Re: How to Increase the number of connexion on Solr/Tomcat6?

2012-07-19 Thread Michael Della Bitta
Hi Bruno,

It's usually the maxThreads attribute in the  tag in
$CATALINA_HOME/conf/server.xml. But I kind of doubt you're running out
of threads... maybe you could post some more details about the system
you're running Solr on.

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Thu, Jul 19, 2012 at 6:47 PM, Bruno Mannina  wrote:
> Dear Solr User,
>
> I don't know if it's here that my question must be posted but I'm sure some
> users have already had my problem.
>
> Actually, I do 1556 requests with 4 Http components with my program. If I do
> these requests without delay (500ms)
> before sending each requests I have around 10% of requests with empty
> answer. If I add delay before each requests I have no empty answer.
>
> Empty answer has HTTP 200 OK, Header OK but Body = ''
>
> Where can I increase the limit of Tomcat/Solr requests at the same time or
> how can I solve my problem.
>
> Thanks a lot for your Help,
> Bruno


How to Increase the number of connexion on Solr/Tomcat6?

2012-07-19 Thread Bruno Mannina

Dear Solr User,

I don't know if it's here that my question must be posted but I'm sure 
some users have already had my problem.


Actually, I do 1556 requests with 4 Http components with my program. If 
I do these requests without delay (500ms)
before sending each requests I have around 10% of requests with empty 
answer. If I add delay before each requests I have no empty answer.


Empty answer has HTTP 200 OK, Header OK but Body = ''

Where can I increase the limit of Tomcat/Solr requests at the same time 
or how can I solve my problem.


Thanks a lot for your Help,
Bruno


Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-19 Thread Briggs Thompson
Thanks Mark!

On Thu, Jul 19, 2012 at 4:07 PM, Mark Miller  wrote:

> https://issues.apache.org/jira/browse/SOLR-3649
>
> On Thu, Jul 19, 2012 at 3:34 PM, Briggs Thompson <
> w.briggs.thomp...@gmail.com> wrote:
>
> > This is unrelated for the most part, but the javabin update request
> handler
> > does not seem to be working properly when calling solrj
> > method*HttpSolrServer.deleteById(List ids)
> > *. A single Id gets deleted from the index as opposed to the full list.
> It
> > appears properly in the logs - shows delete of all Ids sent, although all
> > but one remain in the index.
> >
> > I confirmed that the default update request handler deletes the list
> > properly, so this appears to be a problem with
> > the BinaryUpdateRequestHandler.
> >
> > Not an issue for me, just spreading the word.
> >
> > Thanks,
> > Briggs
> >
> > On Thu, Jul 19, 2012 at 9:00 AM, Mark Miller 
> > wrote:
> >
> > > we really need to resolve that issue soon...
> > >
> > > On Jul 19, 2012, at 12:08 AM, Briggs Thompson wrote:
> > >
> > > > Yury,
> > > >
> > > > Thank you so much! That was it. Man, I spent a good long while
> trouble
> > > > shooting this. Probably would have spent quite a bit more time. I
> > > > appreciate your help!!
> > > >
> > > > -Briggs
> > > >
> > > > On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats 
> wrote:
> > > >
> > > >> On 7/18/2012 7:11 PM, Briggs Thompson wrote:
> > > >>> I have realized this is not specific to SolrJ but to my instance of
> > > >> Solr. Using curl to delete by query is not working either.
> > > >>
> > > >> Can be this: https://issues.apache.org/jira/browse/SOLR-3432
> > > >>
> > >
> > > - Mark Miller
> > > lucidimagination.com
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>


Re: Is it possible to alias a facet field?

2012-07-19 Thread Chris Hostetter

: > 
facet.field=testfield&facet.field=%7B!key=mylabel%7Dtestfield&f.mylabel.limit=1
: >
: > but the limit on the alias didn't seem to work.  Is this expected?
: 
: Per-field params don't currently look under the alias.  I believe
: there's a JIRA open for this.

https://issues.apache.org/jira/browse/SOLR-1351

There's a fairly old patch that covers some of the basics, but there are 
some tricky edge cases that need to be accounted for and we need good 
distributed tests to make sure things work properly.


-Hoss


RE: solr 4.0 cloud 303 error

2012-07-19 Thread John-Paul Drawneek
I did a search via both admin UI and /search

What I searched for was *:* as that was default in the search box in the admin 
ui (so expected something that was not an 303 error).

Will post url and server logs tomorrow when I am back in the office.

But i think the admin url was not anything odd.

Server logs was full of chatter between the nodes in the cloud setup.


From: Chris Hostetter [hossman_luc...@fucit.org]
Sent: 19 July 2012 23:03
To: solr-user
Subject: Re: solr 4.0 cloud 303 error

: > try to do a search - throws 303 error

Can you be specific about how exactly you did the search?

Was this from the admin UI?  what URL was in your browser location bar?
what values did you put in the form? what buttons did you click? what URL
was in your browser location bar when the error happened?

Can you post the logs from each of the servers from arround the time of
this error (a few lings of context before it happened as well)

: >> org.apache.solr.common.SolrException: Server at
: >> http://linux-vckp:8983/solr/collection1 returned non ok status:303,

that smells like something jetty *might* be returning automaticly
because the client asked for...
   http://linux-vckp:8983/solr/collection1
...instead of...
   http://linux-vckp:8983/solr/collection1/
... (ie: no trailing slash) ... but i'm not sure why HttpShardHandler
would be asking for either of those URLs w/o specifying a handler.



-Hoss




This email is intended for the addressee(s) named above. It may contain 
confidential or privileged information and should not be read, copied or 
otherwise used by any person for whom it was not intended.
If you have received this mail in error please contact the sender by return 
email and delete the email from your system.

The Royal National Theatre
Upper Ground, London, SE1 9PX
www.nationaltheatre.org.uk
Telephone numbers: BOX OFFICE +44 (0) 20 7452 3000, SWITCHBOARD +44 (0) 20 7452 

Registered in England as a company limited by guarantee, number 749504.
Registered Charity number 224223
 
Recipients are advised to apply their own virus checks to this message on 
delivery. 





Re: Count is inconsistent between facet and stats

2012-07-19 Thread Chris Hostetter

: So from StatsComponent the count for 'electronics' cat is 3, while
: FacetComponent report 14 'electronics'. Is this a bug?
: 
: Following is the field definition for 'cat'.
: 

FYI...

https://issues.apache.org/jira/browse/SOLR-3642

(The underlying problem is that the stats.facet feature doesn't work for 
multivalued fields, and the check that was suppose to return an error in 
this case was only checking the fieldtype not the field)


-Hoss


Re: solr 4.0 cloud 303 error

2012-07-19 Thread Chris Hostetter

: > try to do a search - throws 303 error

Can you be specific about how exactly you did the search?

Was this from the admin UI?  what URL was in your browser location bar? 
what values did you put in the form? what buttons did you click? what URL 
was in your browser location bar when the error happened?

Can you post the logs from each of the servers from arround the time of 
this error (a few lings of context before it happened as well)

: >> org.apache.solr.common.SolrException: Server at
: >> http://linux-vckp:8983/solr/collection1 returned non ok status:303,

that smells like something jetty *might* be returning automaticly 
because the client asked for...
   http://linux-vckp:8983/solr/collection1
...instead of...
   http://linux-vckp:8983/solr/collection1/ 
... (ie: no trailing slash) ... but i'm not sure why HttpShardHandler 
would be asking for either of those URLs w/o specifying a handler.



-Hoss


Re: solr 4.0 cloud 303 error

2012-07-19 Thread Mark Miller
Okay - I'll do the same in a bit and report back.

On Jul 19, 2012, at 5:23 PM, John-Paul Drawneek wrote:

> This is just out of the box.
> 
> All I did was download solr 4 Alpha from the site.
> unpack
> follow instructions from wiki.
> 
> admin console worked - great
> 
> try to do a search - throws 303 error
> 
> Downloaded nightly build, same issue.
> 
> Also got errors from the other shard with error connecting due to master 
> throwing 303 errors.
> 
> From: Mark Miller [markrmil...@gmail.com]
> Sent: 19 July 2012 22:11
> To: solr-user@lucene.apache.org
> Subject: Re: solr 4.0 cloud 303 error
> 
> That's really odd - never seen or heard anything like it. A 303 is what a
> server will respond with if you should GET a different URI...
> 
> This won't happen out of the box that I've ever seen...can you tells us
> about any customization's you have made?
> 
> On Thu, Jul 19, 2012 at 1:08 PM, John-Paul Drawneek <
> jpdrawn...@nationaltheatre.org.uk> wrote:
> 
>> Hi.
>> 
>> playing with the new solrcloud:
>> http://wiki.apache.org/solr/SolrCloud
>> 
>> tried alpha + nightly build 19/07/2012
>> 
>> admin panel works, but select queries fail with:
>> 
>> org.apache.solr.common.SolrException: Server at
>> http://linux-vckp:8983/solr/collection1 returned non ok status:303,
>> message:See Other at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:376)
>> at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
>> at
>> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:165)
>> at
>> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:132)
>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at
>> java.util.concurrent.FutureTask.run(FutureTask.java:166) at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at
>> java.util.concurrent.FutureTask.run(FutureTask.java:166) at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> at java.lang.Thread.run(Thread.java:722)
>> 
>> on all combinations of solrcloud from the example in the wiki.
>> 
>> Is this a known issue?
>> 
>> Had a quick look at the tracker/wiki/google but did not find any reference
>> to this.
>> 
>> 
>> 
>> 
>> This email is intended for the addressee(s) named above. It may contain
>> confidential or privileged information and should not be read, copied or
>> otherwise used by any person for whom it was not intended.
>> If you have received this mail in error please contact the sender by
>> return email and delete the email from your system.
>> 
>> The Royal National Theatre
>> Upper Ground, London, SE1 9PX
>> www.nationaltheatre.org.uk
>> Telephone numbers: BOX OFFICE +44 (0) 20 7452 3000, SWITCHBOARD +44 (0)
>> 20 7452 
>> Registered in England as a company limited by guarantee, number 749504.
>> Registered Charity number 224223
>> 
>> Recipients are advised to apply their own virus checks to this message on
>> delivery.
>> 
>> 
>> 
> 
> 
> --
> - Mark
> 
> http://www.lucidimagination.com
> 
> 
> 
> 
> This email is intended for the addressee(s) named above. It may contain 
> confidential or privileged information and should not be read, copied or 
> otherwise used by any person for whom it was not intended.
> If you have received this mail in error please contact the sender by return 
> email and delete the email from your system.
> 
> The Royal National Theatre
> Upper Ground, London, SE1 9PX
> www.nationaltheatre.org.uk
> Telephone numbers: BOX OFFICE +44 (0) 20 7452 3000, SWITCHBOARD +44 (0) 20 
> 7452 
> Registered in England as a company limited by guarantee, number 749504.
> Registered Charity number 224223
> 
> Recipients are advised to apply their own virus checks to this message on 
> delivery. 
> 
> 
> 

- Mark Miller
lucidimagination.com













RE: solr 4.0 cloud 303 error

2012-07-19 Thread John-Paul Drawneek
This is just out of the box.

All I did was download solr 4 Alpha from the site.
unpack
follow instructions from wiki.

admin console worked - great

try to do a search - throws 303 error

Downloaded nightly build, same issue.

Also got errors from the other shard with error connecting due to master 
throwing 303 errors.

From: Mark Miller [markrmil...@gmail.com]
Sent: 19 July 2012 22:11
To: solr-user@lucene.apache.org
Subject: Re: solr 4.0 cloud 303 error

That's really odd - never seen or heard anything like it. A 303 is what a
server will respond with if you should GET a different URI...

This won't happen out of the box that I've ever seen...can you tells us
about any customization's you have made?

On Thu, Jul 19, 2012 at 1:08 PM, John-Paul Drawneek <
jpdrawn...@nationaltheatre.org.uk> wrote:

> Hi.
>
> playing with the new solrcloud:
> http://wiki.apache.org/solr/SolrCloud
>
> tried alpha + nightly build 19/07/2012
>
> admin panel works, but select queries fail with:
>
> org.apache.solr.common.SolrException: Server at
> http://linux-vckp:8983/solr/collection1 returned non ok status:303,
> message:See Other at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:376)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
> at
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:165)
> at
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:132)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at
> java.util.concurrent.FutureTask.run(FutureTask.java:166) at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at
> java.util.concurrent.FutureTask.run(FutureTask.java:166) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
>
> on all combinations of solrcloud from the example in the wiki.
>
> Is this a known issue?
>
> Had a quick look at the tracker/wiki/google but did not find any reference
> to this.
>
>
>
>
> This email is intended for the addressee(s) named above. It may contain
> confidential or privileged information and should not be read, copied or
> otherwise used by any person for whom it was not intended.
> If you have received this mail in error please contact the sender by
> return email and delete the email from your system.
>
> The Royal National Theatre
> Upper Ground, London, SE1 9PX
> www.nationaltheatre.org.uk
> Telephone numbers: BOX OFFICE +44 (0) 20 7452 3000, SWITCHBOARD +44 (0)
> 20 7452 
> Registered in England as a company limited by guarantee, number 749504.
> Registered Charity number 224223
>
> Recipients are advised to apply their own virus checks to this message on
> delivery.
>
>
>


--
- Mark

http://www.lucidimagination.com




This email is intended for the addressee(s) named above. It may contain 
confidential or privileged information and should not be read, copied or 
otherwise used by any person for whom it was not intended.
If you have received this mail in error please contact the sender by return 
email and delete the email from your system.

The Royal National Theatre
Upper Ground, London, SE1 9PX
www.nationaltheatre.org.uk
Telephone numbers: BOX OFFICE +44 (0) 20 7452 3000, SWITCHBOARD +44 (0) 20 7452 

Registered in England as a company limited by guarantee, number 749504.
Registered Charity number 224223
 
Recipients are advised to apply their own virus checks to this message on 
delivery. 





Re: solr 4.0 cloud 303 error

2012-07-19 Thread Mark Miller
That's really odd - never seen or heard anything like it. A 303 is what a
server will respond with if you should GET a different URI...

This won't happen out of the box that I've ever seen...can you tells us
about any customization's you have made?

On Thu, Jul 19, 2012 at 1:08 PM, John-Paul Drawneek <
jpdrawn...@nationaltheatre.org.uk> wrote:

> Hi.
>
> playing with the new solrcloud:
> http://wiki.apache.org/solr/SolrCloud
>
> tried alpha + nightly build 19/07/2012
>
> admin panel works, but select queries fail with:
>
> org.apache.solr.common.SolrException: Server at
> http://linux-vckp:8983/solr/collection1 returned non ok status:303,
> message:See Other at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:376)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
> at
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:165)
> at
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:132)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at
> java.util.concurrent.FutureTask.run(FutureTask.java:166) at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at
> java.util.concurrent.FutureTask.run(FutureTask.java:166) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
>
> on all combinations of solrcloud from the example in the wiki.
>
> Is this a known issue?
>
> Had a quick look at the tracker/wiki/google but did not find any reference
> to this.
>
>
>
>
> This email is intended for the addressee(s) named above. It may contain
> confidential or privileged information and should not be read, copied or
> otherwise used by any person for whom it was not intended.
> If you have received this mail in error please contact the sender by
> return email and delete the email from your system.
>
> The Royal National Theatre
> Upper Ground, London, SE1 9PX
> www.nationaltheatre.org.uk
> Telephone numbers: BOX OFFICE +44 (0) 20 7452 3000, SWITCHBOARD +44 (0)
> 20 7452 
> Registered in England as a company limited by guarantee, number 749504.
> Registered Charity number 224223
>
> Recipients are advised to apply their own virus checks to this message on
> delivery.
>
>
>


-- 
- Mark

http://www.lucidimagination.com


Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-19 Thread Mark Miller
https://issues.apache.org/jira/browse/SOLR-3649

On Thu, Jul 19, 2012 at 3:34 PM, Briggs Thompson <
w.briggs.thomp...@gmail.com> wrote:

> This is unrelated for the most part, but the javabin update request handler
> does not seem to be working properly when calling solrj
> method*HttpSolrServer.deleteById(List ids)
> *. A single Id gets deleted from the index as opposed to the full list. It
> appears properly in the logs - shows delete of all Ids sent, although all
> but one remain in the index.
>
> I confirmed that the default update request handler deletes the list
> properly, so this appears to be a problem with
> the BinaryUpdateRequestHandler.
>
> Not an issue for me, just spreading the word.
>
> Thanks,
> Briggs
>
> On Thu, Jul 19, 2012 at 9:00 AM, Mark Miller 
> wrote:
>
> > we really need to resolve that issue soon...
> >
> > On Jul 19, 2012, at 12:08 AM, Briggs Thompson wrote:
> >
> > > Yury,
> > >
> > > Thank you so much! That was it. Man, I spent a good long while trouble
> > > shooting this. Probably would have spent quite a bit more time. I
> > > appreciate your help!!
> > >
> > > -Briggs
> > >
> > > On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats  wrote:
> > >
> > >> On 7/18/2012 7:11 PM, Briggs Thompson wrote:
> > >>> I have realized this is not specific to SolrJ but to my instance of
> > >> Solr. Using curl to delete by query is not working either.
> > >>
> > >> Can be this: https://issues.apache.org/jira/browse/SOLR-3432
> > >>
> >
> > - Mark Miller
> > lucidimagination.com
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>



-- 
- Mark

http://www.lucidimagination.com


Re: Reg issue with indexing data from one of the sqlserver DB

2012-07-19 Thread Michael Della Bitta
Your password has an & in it. Since this is an XML file, you need to
turn it into an XML entity, so your password should be entered as:

8ty&2ty=6

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Thu, Jul 19, 2012 at 3:54 PM, lakshmi bhargavi
 wrote:
> Hi Team ,
>
> Greetings!
>
> We are trying to index data from one of the sqlserver DB (2008) but we are
> getting the following error on start up
>
> Jul 19, 2012 2:41:11 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException
> at org.apache.solr.core.SolrCore.(SolrCore.java:600)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:483)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:335)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:219)
> at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:161)
> at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96)
> at
> org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277)
> at
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258)
> at
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382)
> at
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:103)
> at
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4638)
> at
> org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5294)
> at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
> at
> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:895)
> at
> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:871)
> at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:615)
> at
> org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:649)
> at
> org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1585)
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> at java.util.concurrent.FutureTask.run(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
> Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.solr.common.SolrException: FATAL: Could not create
> importer. DataImporter config invalid
> at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:124)
> at
> org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:527)
> at org.apache.solr.core.SolrCore.(SolrCore.java:594)
> ... 23 more
> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> Exception occurred while initializing context
> at
> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:216)
> at
> org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:108)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:117)
> ... 25 more
> Caused by: org.xml.sax.SAXParseException: The entity name must immediately
> follow the '&' in the entity reference.
> at
> com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown
> Source)
> at
> com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown
> Source)
> at
> com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown
> Source)
> at
> com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown
> Source)
> at
> com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(Unknown
> Source)
> at
> com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanAttribute(Unknown
> Source)
> at
> com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown
> Source)
> at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown
> Source)
> at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown
> Source)
> at
> com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown
> Source)
> at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
> at
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown
> Source)
> at
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown
> Source)
> at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown
> Source)
> at com.sun.org.apache.xerces.internal

Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-19 Thread Briggs Thompson
This is unrelated for the most part, but the javabin update request handler
does not seem to be working properly when calling solrj
method*HttpSolrServer.deleteById(List ids)
*. A single Id gets deleted from the index as opposed to the full list. It
appears properly in the logs - shows delete of all Ids sent, although all
but one remain in the index.

I confirmed that the default update request handler deletes the list
properly, so this appears to be a problem with
the BinaryUpdateRequestHandler.

Not an issue for me, just spreading the word.

Thanks,
Briggs

On Thu, Jul 19, 2012 at 9:00 AM, Mark Miller  wrote:

> we really need to resolve that issue soon...
>
> On Jul 19, 2012, at 12:08 AM, Briggs Thompson wrote:
>
> > Yury,
> >
> > Thank you so much! That was it. Man, I spent a good long while trouble
> > shooting this. Probably would have spent quite a bit more time. I
> > appreciate your help!!
> >
> > -Briggs
> >
> > On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats  wrote:
> >
> >> On 7/18/2012 7:11 PM, Briggs Thompson wrote:
> >>> I have realized this is not specific to SolrJ but to my instance of
> >> Solr. Using curl to delete by query is not working either.
> >>
> >> Can be this: https://issues.apache.org/jira/browse/SOLR-3432
> >>
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>
>


Re: Frustrating differences in fieldNorm between two different versions of solr indexing the same document

2012-07-19 Thread Aaron Daubman
Robert,

So this is lossy: basically you can think of there being only 256
> possible values. So when you increased the number of terms only
> slightly by changing your analysis, this happened to bump you over the
> edge rounding you up to the next value.
>
> more information:
> http://lucene.apache.org/core/3_6_0/scoring.html
>
> http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Similarity.html



Thanks - this was extremely helpful! I had read both sources before but
didn't grasp the magnitude of lossy-ness until your pointer and mention of
edge-case.
Just to help out anybody else who might run in to this, I hacked together a
little harness to demonstrate:
---
fieldLength: 160, computeNorm: 0.07905694, floatToByte315: 109,
byte315ToFloat: 0.078125
fieldLength: 161, computeNorm: 0.07881104, floatToByte315: 109,
byte315ToFloat: 0.078125
fieldLength: 162, computeNorm: 0.07856742, floatToByte315: 109,
byte315ToFloat: 0.078125
fieldLength: 163, computeNorm: 0.07832605, floatToByte315: 109,
byte315ToFloat: 0.078125
fieldLength: 164, computeNorm: 0.07808688, floatToByte315: 108,
byte315ToFloat: 0.0625
fieldLength: 165, computeNorm: 0.077849895, floatToByte315: 108,
byte315ToFloat: 0.0625
fieldLength: 166, computeNorm: 0.07761505, floatToByte315: 108,
byte315ToFloat: 0.0625
---

So my takeaway is that these scores that vary significantly are caused by:
1) a field with lengths right on this boundary between the two analyzer
chains
2) the fact that we might be searching for matches from 50+ values to a
field with 150+ values, and so the overall score is repeatedly impacted by
the otherwise typically insignificant change in fieldNorm value

Thanks again,
 Aaron


RE: Solr Commit not working after delete

2012-07-19 Thread Rohit
Hi Brandan,

I am not sure if get whats being suggested. Our delete worked fine, but now
no new data is going into the system.

Could you please throw some more light. 

Regards,
Rohit

-Original Message-
From: Brendan Grainger [mailto:brendan.grain...@gmail.com] 
Sent: 19 July 2012 17:33
To: solr-user@lucene.apache.org
Subject: Re: Solr Commit not working after delete

You might be running into the same issue someone else had the other day:

https://issues.apache.org/jira/browse/SOLR-3432



On Jul 19, 2012, at 1:23 PM, Rohit wrote:

> We delete some data from solr, post which solr is not accepting any 
> commit's. What could be wrong?
> 
> 
> 
> We don't see any error in logs or anywhere else.
> 
> 
> 
> Regards,
> 
> Rohit
> 
> 
> 





Re: LUCENE-2899 patch, OpenNLPTokenizer compile error

2012-07-19 Thread Thomas Matthijs
>
> and get the following errors:
> ---
>
>  [javac] warning: [options] bootstrap class path not set in conjunction
> with -source 1.6
> [javac]
> /home/swu/newproject/lucene_4x/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:170:
> error: method reset in class TokenStream cannot be applied to given types;
> [javac] super.reset(input);
> [javac]  ^
> [javac]   required: no arguments
> [javac]   found: Reader
> [javac]   reason: actual and formal argument lists differ in length
> [javac]


reset was renamed to setReader


LUCENE-2899 patch, OpenNLPTokenizer compile error

2012-07-19 Thread sam wu
I am following instruction
http://wiki.apache.org/solr/OpenNLP to test OpenNLP, Solr integration

1. pull 4.0 branch from trunk
2. apply patch LUCENE-2899 patch
(there are several LUCENE-2899 patch files, I took the one, 385KB,
02/Jul/12 08:05, I should only apply this one, correct ?)
3. ant compile

and get the following errors:
---

 [javac] warning: [options] bootstrap class path not set in conjunction
with -source 1.6
[javac]
/home/swu/newproject/lucene_4x/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:170:
error: method reset in class TokenStream cannot be applied to given types;
[javac] super.reset(input);
[javac]  ^
[javac]   required: no arguments
[javac]   found: Reader
[javac]   reason: actual and formal argument lists differ in length
[javac]
/home/swu/newproject/lucene_4x/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:168:
error: method does not override or implement a method from a supertype
[javac]   @Override
[javac]   ^
[javac] 2 errors
[javac] 2 warnings

BUILD FAILED


---

I am running java 1.7. Is patch only work for java 1.6 ?, or I am doing
something wrong.


Thanks


Sam


Re: Solr Commit not working after delete

2012-07-19 Thread Brendan Grainger
You might be running into the same issue someone else had the other day:

https://issues.apache.org/jira/browse/SOLR-3432



On Jul 19, 2012, at 1:23 PM, Rohit wrote:

> We delete some data from solr, post which solr is not accepting any
> commit's. What could be wrong?
> 
> 
> 
> We don't see any error in logs or anywhere else.
> 
> 
> 
> Regards,
> 
> Rohit
> 
> 
> 



Solr Commit not working after delete

2012-07-19 Thread Rohit
We delete some data from solr, post which solr is not accepting any
commit's. What could be wrong?

 

We don't see any error in logs or anywhere else.

 

Regards,

Rohit

 



Re: Problem with Solr logging under Jetty

2012-07-19 Thread Rémy Loubradou
Hello,

I have a similar problem, anything new about this issue?

My problem is that info logs go to stderr and not stdout, do you have an
explanation?

For the log level I use the file "logging.properties" with in it only one
line setting the level.


.level = INFO


and I have a configuration file called "jetty-logging.xml" passed at
startup of jetty through start.ini for redirect stdout to a file and stderr
to another file, the config looks like this:



http://www.eclipse.org/jetty/configure.dtd";>










/jetty_err.log
true
90
GMT










/jetty_out.log
true
90
GMT






Redirecting stdout to 



Redirecting stderr to 












Thanks for your help,
Remy

On 23 November 2011 16:55, Shawn Heisey  wrote:

> I am having a problem with jdk logging with Solr, using the jetty included
> with Solr.
>
> In jetty.xml, I have the following defined:
> 
> java.util.logging.config.**file
> etc/logging.properties
> 

Contents of etc/logging.properties:
> ==
> #  Logging level
> .level=WARNING
>
> # Write to a file
> handlers = java.util.logging.FileHandler
>
> # Write log messages in human readable format:
> java.util.logging.FileHandler.**formatter = java.util.logging.**
> SimpleFormatter
> java.util.logging.**ConsoleHander.formatter = java.util.logging.**
> SimpleFormatter
>
> # Log to the log subdirectory, with log files named solr_log-n.log
> java.util.logging.FileHandler.**pattern = ./log/solr_log-%g.log
> java.util.logging.FileHandler.**append = true
> java.util.logging.FileHandler.**count = 10
> java.util.logging.FileHandler.**limit = 10485760
> ==
>
> This actually all seems to work perfectly at first.  I changed the logging
> level to INFO in the solr admin, and it still seemed to work.  Then at some
> point it stopped logging to solr_log-0.log and started logging to stderr.
>  My init script for Solr sends that to a file, but there's no log rotation
> on that file and it is overwritten whenever Solr is restarted.
>
> With the same config, OS version, java version, and everything else I can
> think of, my test server is still working, but all of my production servers
> aren't.  It does seem to be related to changing the log level to INFO in
> the gui, but making that change doesn't make it fail right away.
>
> What information can I provide to help troubleshoot this?
>
> Thanks,
> Shawn
>
>


Re: Importing data to Solr

2012-07-19 Thread Erick Erickson
First, turn off all your soft commit stuff, that won't help in your situation.
If you do leave autocommit on, make it a really high number
(let's say 1,000,000 to start).

You won't have to make 300M calls, you can batch, say, 1,000 docs
into each request.

DIH supports a bunch of different data sources, take a
look at: http://wiki.apache.org/solr/DataImportHandler, the
EntityProcessor, DataSource and the like.

There is also the CSV update processor, see:
http://wiki.apache.org/solr/UpdateCSV. It might be better to, say,
break up your massive file into N CSV files and import those.

Best
Erick

On Thu, Jul 19, 2012 at 12:04 PM, Jonatan Fournier
 wrote:
> Hello,
>
> I was wondering if there's other ways to import data in Solr than
> posting xml/json/csv to the server URL (e.g. locally building the
> index). Is the DataImporter only for database?
>
> My data is in an enormous text file that is parsed in python, I get
> clean json/xml out of it if I want, but the thing is that it drills
> down to about 300 millions "documents", so I don't want to execute 300
> millions http post in a for loop, even with relaxed soft commits etc
> it will take weeks, months to populate the index.
>
> I need to do that only once on an offline server and never add data
> back to the index (e.g. becomes a read-only instance).
>
> Any temporary index configuration I could have to populate the server
> with optimal add speed, then turn back the settings optimized for a
> read only instance?
>
> Thanks!
>
> --
> jonatan


Re: Importing data to Solr

2012-07-19 Thread Michael Della Bitta
Hi Jonatan,

Ideally you'd use a Solr API client that allowed batched updates, so
you'd be sending documents 100 at a time, say. Alternatively, if
you're good with Java, you could build an index by using the
EmbeddedSolrServer class in the same process as the code you use to
parse the documents. But if your Solr API client is using batches and
multiple connections, I'm not sure if the tradeoff is worth it.

Also, there are some various efforts out there to build indexes in
Hadoop, but I don't believe any of them are 100% production ready
(would like to be proven wrong.)

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Thu, Jul 19, 2012 at 12:04 PM, Jonatan Fournier
 wrote:
> Hello,
>
> I was wondering if there's other ways to import data in Solr than
> posting xml/json/csv to the server URL (e.g. locally building the
> index). Is the DataImporter only for database?
>
> My data is in an enormous text file that is parsed in python, I get
> clean json/xml out of it if I want, but the thing is that it drills
> down to about 300 millions "documents", so I don't want to execute 300
> millions http post in a for loop, even with relaxed soft commits etc
> it will take weeks, months to populate the index.
>
> I need to do that only once on an offline server and never add data
> back to the index (e.g. becomes a read-only instance).
>
> Any temporary index configuration I could have to populate the server
> with optimal add speed, then turn back the settings optimized for a
> read only instance?
>
> Thanks!
>
> --
> jonatan


Importing data to Solr

2012-07-19 Thread Jonatan Fournier
Hello,

I was wondering if there's other ways to import data in Solr than
posting xml/json/csv to the server URL (e.g. locally building the
index). Is the DataImporter only for database?

My data is in an enormous text file that is parsed in python, I get
clean json/xml out of it if I want, but the thing is that it drills
down to about 300 millions "documents", so I don't want to execute 300
millions http post in a for loop, even with relaxed soft commits etc
it will take weeks, months to populate the index.

I need to do that only once on an offline server and never add data
back to the index (e.g. becomes a read-only instance).

Any temporary index configuration I could have to populate the server
with optimal add speed, then turn back the settings optimized for a
read only instance?

Thanks!

--
jonatan


Re: Frustrating differences in fieldNorm between two different versions of solr indexing the same document

2012-07-19 Thread Robert Muir
On Thu, Jul 19, 2012 at 11:11 AM, Aaron Daubman  wrote:

> Apologies if I didn't clearly state my goal/concern: I am not looking for
> the exact same scoring - I am looking to explain scoring differences.
>  Deprecated components will eventually go away, time moves on, etc...
> etc... I would like to be able to run current code, and should be able to -
> the part that is sticking is being able to *explain* the difference in
> results.
>

OK: i totally missed that, sorry!

to explain why you see such a large difference:

The difference is that these length normalizations are computed at
index time and fit inside a *single byte* by default. This is to keep
ram usage low for many documents and many fields with norms (since its
#fieldsWithNorms * #documents in bytes in ram).
So this is lossy: basically you can think of there being only 256
possible values. So when you increased the number of terms only
slightly by changing your analysis, this happened to bump you over the
edge rounding you up to the next value.

more information:
http://lucene.apache.org/core/3_6_0/scoring.html
http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Similarity.html

by the way: if you don't like this:
1. if you can still live with a single byte, maybe plug in your own
Similarity class into 3.6, overriding decodeNormValue/encodeNormValue.
For example, you could use a different SmallFloat configuration that
has less range but more precision for your use case (if your docs are
all short or whatever)
2. otherwise, if you feel you need more than a single byte, check out
4.0-ALPHA: you arent limited to a single byte there.

-- 
lucidimagination.com


RE: How to setup SimpleFSDirectoryFactory

2012-07-19 Thread Uwe Schindler
Read this, then you will see that MMapDirectory will use 0% of your Java Heap 
space or free system RAM:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: William Bell [mailto:billnb...@gmail.com]
> Sent: Tuesday, July 17, 2012 6:05 AM
> Subject: How to setup SimpleFSDirectoryFactory
> 
> We all know that MMapDirectory is fastest. However we cannot always use it
> since you might run out of memory on large indexes right?
> 
> Here is how I got iSimpleFSDirectoryFactory to work. Just set -
> Dsolr.directoryFactory=solr.SimpleFSDirectoryFactory.
> 
> Your solrconfig.xml:
> 
>  class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
> 
> You can check it with http://localhost:8983/solr/admin/stats.jsp
> 
> Notice that the default for Windows 64bit is MMapDirectory. Else
> NIOFSDirectory except for WIndows It would be nicer if we just set it all 
> up
> with a helper in solrconfig.xml...
> 
> if (Constants.WINDOWS) {
>  if (MMapDirectory.UNMAP_SUPPORTED && Constants.JRE_IS_64BIT)
> return new MMapDirectory(path, lockFactory);
>  else
> return new SimpleFSDirectory(path, lockFactory);
>  } else {
> return new NIOFSDirectory(path, lockFactory);
>   }
> }
> 
> 
> 
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076




Re: Frustrating differences in fieldNorm between two different versions of solr indexing the same document

2012-07-19 Thread Aaron Daubman
Robert,

> I have a solr 1.4.1 instance and a solr 3.6.0 instance, both configured as
> > identically as possible (given deprecations) and indexing the same
> document.
>
> Why did you do this? If you want the exact same scoring, use the exact
> same analysis.
> This means specifying luceneMatchVersion = 2.9, and the exact same
> analysis components (even if deprecated).
>
> > I have taken the field values for the example below and run them
> > through /admin/analysis.jsp on each solr instance. Even for the
> problematic
> > docs/fields, the results are almost identical. For the example below, the
> > t_tag values for the problematic doc:
> > 1.4.1: 162 values
> > 3.6.0: 164 values
> >
>
> This is why: you changed your analysis.
>

Apologies if I didn't clearly state my goal/concern: I am not looking for
the exact same scoring - I am looking to explain scoring differences.
 Deprecated components will eventually go away, time moves on, etc...
etc... I would like to be able to run current code, and should be able to -
the part that is sticking is being able to *explain* the difference in
results.

As you can see from my email, after running the different analysis on the
input, the output does not demonstrate (in any way that I can see) why the
fieldNorm values would be so different. Even with the different analysis,
the results are almost identical - which *should* result in an almost
identical fieldNorm???

Again, the desire is not to be the same, it is to understand the difference.

Thanks,
 Aaron


Re: Solr faceting -- sort order

2012-07-19 Thread Michael Della Bitta
Maybe I'm not understanding the problem, but I accomplish this by
having two fields. One for sorting, like so:














And then a string type field for faceting. Use a copyField directive
to get the same data in both, and then sort on the sort field, and
facet on the string field. The MappingCharFilterFactory removes
accents for sorting, so you don't have to worry about accented
characters sorting out of order.

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Thu, Jul 19, 2012 at 4:37 AM, Toke Eskildsen  
wrote:
> On Wed, 2012-07-18 at 20:30 +0200, Christopher Gross wrote:
>> When I do a query, the results that come through retain their original
>> case for this field, like:
>> doc 1
>> keyword: Blah Blah Blah
>> doc 2
>> keyword: Yadda Yadda Yadda
>>
>> But when I pull back facets, i get:
>>
>> blah blah blah (1)
>> yadda yadda yadda (1)
>
> Yes. The results from your query are the stored values, while the
> results from your facets are the indexed ones. That's the way faceting
> works with Solr.
>
> Technically there is nothing wrong with writing a faceting system that
> uses the stored values. We did this some years back, but abandoned the
> idea. As far as I remember, it was a lot slower to initialize the
> internal structures this way. One could also do faceting fully at search
> time, by iterating all the documents and requesting the stored value for
> each of them directly from the index, but that would be very slow.
>
>> I was attempting to fix a sorting problem -- keyword "" would show
>> up after keyword "Zulu" due to the "index" sorting, so I thought that
>> I could lowercase it all to have it be in the same order.  But now it
>> is all in lower case, and I'd like it to retain the original style.
>
> Currently the lowercase trick is the only solution for plain Solr and
> even that only works as long as your field holds only a-z letters. So no
> foreign names or such.
>
> Looking forward, one solution would be to specify a custom codec for the
> facet field, where the comparator used for sorting is backed by a
> Collator that sorts the terms directly, instead of using CollatorKeys.
> It would be a bit slower for index updates, but should do what you
> require. Unfortunately I am not aware of anyone who has created such a
> codex or even how easy it is to get it to work with Solr (4.0 alpha).
>
> We have experimented with a faceting approach that allows for custom
> ordering, but it sorts upon index open and thus has a fairly long start
> up time. Besides, it it not in a proper state for production:
> https://issues.apache.org/jira/browse/SOLR-2412
>
> - Toke Eskildsen, State and University Library, Denmark
>


Re: Solr grouping / facet query

2012-07-19 Thread s215903406
Thanks for the reply. 

To clarify, the idea is to search for authors with certain specialties (eg.
political, horror, etc.) and if they have any published titles relevant to
the user's query, then display those titles next to the author's name. 

At first, I thought it would be great to have all the author's data (name,
location, bio, titles with descriptions, etc) all in one document. Each
title and description being a multivalued field, however, I have no idea how
the "relevant titles" based on the user's query as described above can be
quickly picked from within the document and displayed.

The only solution I see is to have a doc per title and include the name,
location, bio, etc in each one. As for the author's with no published
titles, simply add their bio data to a document with no title or description
and when I do the "grouping" check to see if the title is blank, then
display "no titles found".

This could work, though I'm concerned if having all that duplicate bio data
will affect the relevancy of the results or speed/performance of solr?

Thank you.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-grouping-facet-query-tp3995787p3995974.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Importing index - Real Time or Queued?

2012-07-19 Thread Toke Eskildsen
On Thu, 2012-07-19 at 16:00 +0200, Spadez wrote:
> This seems to suggest you have to reindex Solr in its entirety and cant add a
> single document at a time, is this right?
> 
> http://stackoverflow.com/questions/11247625/apache-solr-adding-editing-deleting-records-frequently

No. What is says is that you can't change _part_ of a document. What you
need to do is send the full document each time it changes. By having a
uniqueKey, Solr will do the bookkeeping for you and delete the old
document before adding the new one.

As for the performance part of the stackoverflow discussion, note that
they are talking about 10 million documents and 10,000 updates. That
quite far from what you've got.

- Toke Eskildsen



Re: Importing index - Real Time or Queued?

2012-07-19 Thread Michael Della Bitta
You can definitely do a single document at a time, but unless you're
using NRT, your changes won't be visible until you do a commit. Doing
a commit involves closing Searchers and reopening them, which is semi
expensive... depending on how you're doing caching, you wouldn't want
to do it too frequently. However, your index is so small, you should
easily be able to get away with doing it every minute or so, depending
on traffic.

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Thu, Jul 19, 2012 at 10:00 AM, Spadez  wrote:
> This seems to suggest you have to reindex Solr in its entirety and cant add a
> single document at a time, is this right?
>
> http://stackoverflow.com/questions/11247625/apache-solr-adding-editing-deleting-records-frequently
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Importing-index-Real-Time-or-Queued-tp3995936p3995964.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Importing index - Real Time or Queued?

2012-07-19 Thread Spadez
This seems to suggest you have to reindex Solr in its entirety and cant add a
single document at a time, is this right?

http://stackoverflow.com/questions/11247625/apache-solr-adding-editing-deleting-records-frequently

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Importing-index-Real-Time-or-Queued-tp3995936p3995964.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-19 Thread Mark Miller
we really need to resolve that issue soon...

On Jul 19, 2012, at 12:08 AM, Briggs Thompson wrote:

> Yury,
> 
> Thank you so much! That was it. Man, I spent a good long while trouble
> shooting this. Probably would have spent quite a bit more time. I
> appreciate your help!!
> 
> -Briggs
> 
> On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats  wrote:
> 
>> On 7/18/2012 7:11 PM, Briggs Thompson wrote:
>>> I have realized this is not specific to SolrJ but to my instance of
>> Solr. Using curl to delete by query is not working either.
>> 
>> Can be this: https://issues.apache.org/jira/browse/SOLR-3432
>> 

- Mark Miller
lucidimagination.com













Re: Solr grouping / facet query

2012-07-19 Thread Erick Erickson
I'm not sure your point <3> makes sense. If you're searching by
author, how do you define "the four most relevant titles"? Relevant
to what?

If you are searching text of the publications, then displaying authors with
no publications seems unhelpful.

If you're searching the bios, how do you define "relevant titles"? Or
are relevant
titles based on some other criteria than you're searching on?

But don't get stuck on worrying about duplicate data, denormalization
of data is a
common practice in Solr/Lucene.

But I'm at something of a loss until you clarify what "relevant
titles" means when
searching for authors.

Best
Erick

On Wed, Jul 18, 2012 at 2:36 PM, s215903406
 wrote:
> Could anyone suggest the options available to handle the following situation:
>
> 1. Say we have 1,000 authors
>
> 2. 65% of these authors have 10-100 titles they authored; the others have
> not authored any titles but provide only their biography and writing
> capability.
>
> 3. We want to search for authors, group the results by author, and show the
> 4 most relevant titles authored for each (if any) next to the author name.
>
> Since not all authors have titles authored, I can't group titles by author.
> Also, adding their bio to each title places a lot of duplicate data in the
> index.
>
> So the search results would look like this;
>
> Author A
> title0, title6, title8, title3
>
> Author G
> no titles found
>
> Author E
> title4, title9, title2
>
> Any suggestions would be appreciated!
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-grouping-facet-query-tp3995787.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR 4 ALPHA /terms /browse

2012-07-19 Thread Mark Miller
Can you file two JIRA issues for these?

bq. but does return reasonable results when distrib is turned off like so

It should default to distrib=false - I don't think /terms is distrib 
aware/compatible.

bq. /browse returns this stack trace to the browser HTTP ERROR 500

We may be able to fix this.

On Jul 18, 2012, at 8:42 PM, Nick Koton wrote:

> When I setup a 2 shard cluster using the example and run it through its
> paces, I find two features that do not work as I expect.  Any suggestions on
> adjusting my configuration or expectations would be appreciated.
> 
> /terms does not return any terms when issued as follows:
> http://hostname:8983/solr/terms?terms.fl=name&terms=true&terms.limit=-1&isSh
> ard=true&terms.sort=index&terms.prefix=s
> but does return reasonable results when distrib is turned off like so
> http://hostname:8983/solr/terms?terms.fl=name&terms=true&distrib=false&terms
> .limit=-1&isShard=true&terms.sort=index&terms.prefix=s
> 
> /browse returns this stack trace to the browser
> HTTP ERROR 500
> 
> Problem accessing /solr/browse. Reason:
> 
>{msg=ZkSolrResourceLoader does not support getConfigDir() - likely, what
> you are trying to do is not supported in ZooKeeper
> mode,trace=org.apache.solr.common.cloud.ZooKeeperException:
> ZkSolrResourceLoader does not support getConfigDir() - likely, what you are
> trying to do is not supported in ZooKeeper mode
>   at
> org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoader
> .java:99)
>   at
> org.apache.solr.response.VelocityResponseWriter.getEngine(VelocityResponseWr
> iter.java:117)
>   at
> org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter
> .java:40)
>   at
> org.apache.solr.core.SolrCore$LazyQueryResponseWriterWrapper.write(SolrCore.
> java:1990)
>   at
> org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.
> java:398)
>   at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
> 276)
>   at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler
> .java:1337)
>   at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
>   at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119
> )
>   at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
>   at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java
> :233)
>   at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java
> :1065)
>   at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
>   at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:
> 192)
>   at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:
> 999)
>   at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117
> )
>   at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHand
> lerCollection.java:250)
>   at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.
> java:149)
>   at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:1
> 11)
>   at org.eclipse.jetty.server.Server.handle(Server.java:351)
>   at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpCo
> nnection.java:454)
>   at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpCo
> nnection.java:47)
>   at
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpC
> onnection.java:890)
>   at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplet
> e(AbstractHttpConnection.java:944)
>   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634)
>   at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)
>   at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnectio
> n.java:66)
>   at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketCon
> nector.java:254)
>   at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:
> 599)
>   at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:5
> 34)
>   at java.lang.Thread.run(Thread.java:662)
> ,code=500}
> 
> Best regards,
> Nick Koton
> 
> 
> 

- Mark Miller
lucidimagination.com













Re: Frustrating differences in fieldNorm between two different versions of solr indexing the same document

2012-07-19 Thread Robert Muir
On Thu, Jul 19, 2012 at 12:10 AM, Aaron Daubman  wrote:
> Greetings,
>
> I've been digging in to this for two days now and have come up short -
> hopefully there is some simple answer I am just not seeing:
>
> I have a solr 1.4.1 instance and a solr 3.6.0 instance, both configured as
> identically as possible (given deprecations) and indexing the same document.

Why did you do this? If you want the exact same scoring, use the exact
same analysis.
This means specifying luceneMatchVersion = 2.9, and the exact same
analysis components (even if deprecated).

> I have taken the field values for the example below and run them
> through /admin/analysis.jsp on each solr instance. Even for the problematic
> docs/fields, the results are almost identical. For the example below, the
> t_tag values for the problematic doc:
> 1.4.1: 162 values
> 3.6.0: 164 values
>

This is why: you changed your analysis.

-- 
lucidimagination.com


Re: Indexing data in csv format

2012-07-19 Thread Erick Erickson
Check your csv file for extraneous data? The other thing to do is look at
your logs to see if more informative information is there.


THere's really very little info to go on here, you might review:
http://wiki.apache.org/solr/UsingMailingListshttp://wiki.apache.org/solr/UsingMailingLists

Best
Erick


On Tue, Jul 17, 2012 at 10:05 AM, gopes
 wrote:
>
> Hi ,
>
> I am trying to index data in csv format. But while indexing I get this
> following message -
>
> 
> HTTP ERROR 404
>
> Problem accessing /solr/update/csv. Reason:
> NOT_FOUND/Powered by Jetty:///
>
> solrconfig.xml has the following entries for CSVRequestHandler
>  startup="lazy">
> 
> ;
> true
> publish_date
> "
> 
> 
>
> Thanks,
> Sarala
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Indexing-data-in-csv-format-tp3995549.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Result docs missing only when shards parameter present in query?

2012-07-19 Thread Erick Erickson
A multiValued  really doesn't make any sense. But your
log file should have something in it like this:
SEVERE: uniqueKey should not be multivalued
although it _is_ a bit hard to see on startup unless you've suppressed
the INFO level output.

See: https://issues.apache.org/jira/browse/SOLR-1570

Best
Erick

On Tue, Jul 17, 2012 at 9:24 AM, Bill Havanki  wrote:
> I had the same problem as the original poster did two years ago (!), but
> with Solr 3.4.0:
>
>> I cannot get hits back and do not get a correct total number of records
> when using shard searching.
>
> When performing a sharded query, I would get empty / missing results - no
> documents at all. Querying each shard individually worked, but anything
> with the "shards" parameter yielded no result documents.
>
> I was able to get results back by updating my schema to include
> multiValued="false" for the unique key field.
>
> The problem I was seeing was that, when Solr was formulating the queries to
> go get records from each shard, it was including square brackets around the
> ids it was asking for, e.g.:
>
> ...q=123&ids=[ID1],[ID2],[ID3]&...
>
> I delved into the Solr code and saw that this query string was being formed
> (in QueryComponent.createRetrieveDocs()) by simply calling toString() on
> the unique key field value for each document it wanted to get. My guess is
> that the value objects somehow were ArrayLists (or something like that) and
> not Strings, so those annoying square brackets showed up via toString(). By
> emphasizing in the schema that the field was single-valued, those lists
> would hopefully stop appearing, and I think they did. At least the brackets
> went away.
>
> Here's the relevant QueryComponent code (again, 3.4.0 - it's the same in
> 3.6.0, didn't check 4):
>
> ArrayList ids = new ArrayList(shardDocs.size());
> for (ShardDoc shardDoc : shardDocs) {
> // TODO: depending on the type, we may need more tha a simple toString()?
>   ids.add(shardDoc.id.toString());
> }
> sreq.params.add(ShardParams.IDS, StrUtils.join(ids, ','));
>
> The comment in there seems to fit my theory. :)
>
> Bill


DIH is doubling field entries

2012-07-19 Thread Bernd Fehling
While porting from 3.6.1 to 4.x I noticed the doubling content of some fields 
in my index.
Didn't have this with 3.6.1.
This can also be seen with luke. I could trace it down to DIH so far.

Anyone seen this?

I'm using XPathEntityProcessor with RegexTransformer.
Will look into this closer tomorrow and try to create an example.

Bernd


Re: Importing index - Real Time or Queued?

2012-07-19 Thread Toke Eskildsen
On Thu, 2012-07-19 at 13:49 +0200, Spadez wrote:
> It does seem really poor design to reimport 10,000 documents, when only one
> needs to be added. I dont like that, can you not insert a specific entry
> into Solr rather than reimporting everything?

Isn't that what you outlined in your option #1?

What you're looking for is probably uniqueKey:
https://wiki.apache.org/solr/UniqueKey

- Toke Eskildsen



Does defType overrides other settings for default request handler

2012-07-19 Thread amitesh116
Hi,

We have used *dismax* in our SOLR config with /defaultOperator="OR"/ and
some *mm * settings. Recently, we have started using *defType=edismax * in
query params. With this change, we have observed significant drop in results
count. We doubt that SOLR is using default operator="AND" and hence reducing
the results count. Please confirm if our suspicion is correct or are we
missing some part?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-defType-overrides-other-settings-for-default-request-handler-tp3995946.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: join this mailing list

2012-07-19 Thread Gora Mohanty
On 19 July 2012 10:15, 晋鹏(Tomsdinary)  wrote:
>
> Hi
>  I wait to join this mailing list.

Please see the very first entry under
http://lucene.apache.org/solr/discussion.html

Regards,
Gora


join this mailing list

2012-07-19 Thread Tomsdinary
Hi
 I wait to join this mailing list.



This email (including any attachments) is confidential and may be legally 
privileged. If you received this email in error, please delete it immediately 
and do not copy it or use it for any purpose or disclose its contents to any 
other person. Thank you.

本电邮(包括任何附件)可能含有机密资料并受法律保护。如您不是正确的收件人,请您立即删除本邮件。请不要将本电邮进行复制并用作任何其他用途、或透露本邮件之内容。谢谢。


Re: Importing index - Real Time or Queued?

2012-07-19 Thread Spadez
Thank you for the reply. Ok, well that brings another question. I dont like
pre-optimisation, but I also dont like inefficiency, so lets see if I can
strike a balance.

It does seem really poor design to reimport 10,000 documents, when only one
needs to be added. I dont like that, can you not insert a specific entry
into Solr rather than reimporting everything?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Importing-index-Real-Time-or-Queued-tp3995936p3995944.html
Sent from the Solr - User mailing list archive at Nabble.com.


NGram Indexing Basic Question

2012-07-19 Thread Husain, Yavar
I have set some of my fields to be NGram Indexed. Have also set analyzer both 
at query as well as index level.

Most of the stuff works fine except for use cases where I simply interchange 
couple of characters.

For an example: "springfield" retrieves correct matches, "springfi" retrieves 
correct matches, "ingfield" retrieves correct matches.

However when i say "springfiedl" it returns 0 results. I debugged and found 
that at query/index level I have all correct N-Grams stored. So ideally it 
should match "springfie" (which is there both in Query NGram and Index NGram) 
and return me the correct results.

As I was busy so did not get time to look at the code for NGram. What ideally 
happens when I use NGram at Query level? Does it split the strings into N-Grams 
and then send each of them to Solr Server?

Thanks Sahi for your help yesterday. Appreciate that.


**This
 message may contain confidential or proprietary information intended only for 
the use of theaddressee(s) named above or may contain information that is 
legally privileged. If you arenot the intended addressee, or the person 
responsible for delivering it to the intended addressee,you are hereby 
notified that reading, disseminating, distributing or copying this message is 
strictlyprohibited. If you have received this message by mistake, please 
immediately notify us byreplying to the message and delete the original 
message and any copies immediately thereafter.

Thank you.~
**
FAFLD



Re: Importing index - Real Time or Queued?

2012-07-19 Thread Toke Eskildsen
On Thu, 2012-07-19 at 12:54 +0200, Spadez wrote:
> I want to import any new SQL results onto the server as quickly as possible
> so they are searchable but I dont want to overload the server. These are my
> new options:
> 
> 1. Devise a script to run when a new SQL item is posted, to immediatly
> import only the new SQL record to Solr

Unless you have really complex documents, which does not sound likely
for an auction site, 20,000 entries and 100 changes is a tiny index in
the Lucene/Solr world.

It sounds like you're optimizing prematurely: Go with option 1 and
expect updates to take a few seconds without the server straining.

- Toke Eskildsen



maxScore returned with distributed search

2012-07-19 Thread Markus Jelsma
Hi,

Why is maxScore always returned with distributed search? It used to return only 
if score was part of fl. Bug? Feature?

Thanks
Markus


Importing index - Real Time or Queued?

2012-07-19 Thread Spadez
Hi,

Lets say I am running an auction site. There are 20,000 entries. 100 entries
come from an on-site SQL database, the rest come from a generated txt file
from scrapped content.

I want to import any new SQL results onto the server as quickly as possible
so they are searchable but I dont want to overload the server. These are my
new options:

1. Devise a script to run when a new SQL item is posted, to immediatly
import only the new SQL record to Solr
2. Run a CRON script on the hour to import the whole SQL database
3. Run a CRON script on the hour to import everything, including the SQL
entries and the large txt file with all the scrapped results.

I would really like to hear your feedback, because I cant get my head around
which one is the most efficient or pratical solution.

James

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Importing-index-Real-Time-or-Queued-tp3995936.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How To apply transformation in DIH for multivalued numeric field?

2012-07-19 Thread jmlucjav
I have seen that issue several times, in my case it was always with an id
field, mysql db and linux. Same config but on windows did not show that
issue. 

Never got to the bottom of it...as it was an id it was just working as it
was unique. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-To-apply-transformation-in-DIH-for-multivalued-numeric-field-tp3995810p3995927.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr faceting -- sort order

2012-07-19 Thread Toke Eskildsen
On Wed, 2012-07-18 at 20:30 +0200, Christopher Gross wrote:
> When I do a query, the results that come through retain their original
> case for this field, like:
> doc 1
> keyword: Blah Blah Blah
> doc 2
> keyword: Yadda Yadda Yadda
> 
> But when I pull back facets, i get:
> 
> blah blah blah (1)
> yadda yadda yadda (1)

Yes. The results from your query are the stored values, while the
results from your facets are the indexed ones. That's the way faceting
works with Solr.

Technically there is nothing wrong with writing a faceting system that
uses the stored values. We did this some years back, but abandoned the
idea. As far as I remember, it was a lot slower to initialize the
internal structures this way. One could also do faceting fully at search
time, by iterating all the documents and requesting the stored value for
each of them directly from the index, but that would be very slow.

> I was attempting to fix a sorting problem -- keyword "" would show
> up after keyword "Zulu" due to the "index" sorting, so I thought that
> I could lowercase it all to have it be in the same order.  But now it
> is all in lower case, and I'd like it to retain the original style.

Currently the lowercase trick is the only solution for plain Solr and
even that only works as long as your field holds only a-z letters. So no
foreign names or such.

Looking forward, one solution would be to specify a custom codec for the
facet field, where the comparator used for sorting is backed by a
Collator that sorts the terms directly, instead of using CollatorKeys.
It would be a bit slower for index updates, but should do what you
require. Unfortunately I am not aware of anyone who has created such a
codex or even how easy it is to get it to work with Solr (4.0 alpha).

We have experimented with a faceting approach that allows for custom
ordering, but it sorts upon index open and thus has a fairly long start
up time. Besides, it it not in a proper state for production:
https://issues.apache.org/jira/browse/SOLR-2412

- Toke Eskildsen, State and University Library, Denmark