Cannot find Solr 7.4.1 release

2021-02-18 Thread Olivier Tavard
Hi,

I wanted to download Solr 7.4.1, but I cannot find the 7.4.1 release into
http://archive.apache.org/dist/lucene/solr/ : there are Solr 7.4 and after
directly 7.5.
Of course I can build from source code, but this is frustrating because I
can see that in the 7_4_branch there is a fix that I need (SOLR-12594) with
the status fixed into 7.4.1 and 7.5 versions. Everythings seems to have
been prepared to release the 7.4.1, but I cannot find it.
Does this release exist ?

Thank you,

Olivier


ConfigSet API V2 issue with configSetProp.property present

2018-11-12 Thread Olivier Tavard
Hi,

I have an issue for creating a configset with the V2 API using a configset
property.
Indeed if I enter the command :
curl -X POST -H 'Content-type: application/json' -d '{ "create":{"name":
"Test", "baseConfigSet": "myConfigSet","configSetProp.immutable":
"false"}}'  http://localhost:8983/api/cluster/configs?omitHeader=true
(same one than in the documentation :
https://lucene.apache.org/solr/guide/7_5/configsets-api.html)
It fails with the error :
"errorMessages":["Unknown field 'configSetProp.immutable' in object : {\n
\"name\":\"Test\",\n  \"baseConfigSet\":\"myConfigSet\",\n
\"configSetProp.immutable\":\"false\"}"]}],
"msg":"Error in command payload",
"code":400}}

If I enter the same command still with the V2 API without the
configSetProp.immutable property it succeeds.

With the V1 API, no problem with or without the presence of the configset
property.

The tests were done with Solr 7.4 and Solr 7.5.

Did I miss something with the configset property usage ?

Thanks,
Best regards,
Olivier


Backup collections using SolrJ

2018-05-04 Thread Olivier Tavard
Hi,

I have a question regarding the backup of a Solr collection using SolrJ. I
use Solr 7.
I want to do a JAR for that and launch it into a cron job.

So far, no problem for the request using
CollectionAdminRequest.backupCollection then I use the processAsync method.

The command is well transmitted to Solr.

My problem is for parsing the response and manage the different cases in
the code for a failure.

Let's say that the Solr response is the following after sending the
asynchronous backup request (the request id is "backupsolr")  :

{
"responseHeader": {
"status": 0,
"QTime": 1
},
"success": {
"IP:8983_solr": {
"responseHeader": {
"status": 0,
"QTime": 0
}
},
"IP:8983_solr": {
"responseHeader": {
"status": 0,
"QTime": 0
}
}
},
"solrbackup5704378348890743": {
"responseHeader": {
"status": 0,
"QTime": 0
},
"STATUS": "failed",
"Response": "Failed to backup core=Test_shard1_replica1 because
java.io.IOException: Aucun espace disponible sur le périphérique"
},
"status": {
"state": "completed",
"msg": "found [solrbackup] in completed tasks"
}
}
If I use the code :
System.out.println(CollectionAdminRequest.requestStatus("solrbackup
").process(solr).getRequestStatus());

The output is : "COMPLETED".
But it is not enough to check if the backup was well done or not. For
example in this case the task is completed but the backup was not
successful because there was not enough space left on the disk.
So the interesting part is into the solrbackup5704378348890743 section of
the response.

My first question is why some numbers are added to the request-id name ?

Because if I write :
CollectionAdminRequest.requestStatus("solrbackup").getRequestId() the
response is : "solrbackup" and not solrbackup5704378348890743.
So retrieving the section related to solrbackup5704378348890743 in the
response is not very easy.
I cannot directly use (NamedList)
CollectionAdminRequest.requestStatus("solrbackup").process(solr).getResponse().get("solrbackup")
but instead I have to use an iterator into the entire Solr response and
check the beginning of each String for retrieving the section that begins
by solrbackup. And finally get the elements that I want.

Am I correct to do this, maybe there is a simpler way to do that ?

Thanks,
Olivier Tavard


Large multivalued field and overseer problem

2015-11-19 Thread Olivier
Hi,

We have a Solrcloud cluster with 3 nodes (4 processors, 24 Gb RAM per node).
We have 3 shards per node and the replication factor is 3. We host 3
collections, the biggest is about 40K documents only.
The most important thing is a multivalued field with about 200K to 300K
values per document (each value is a kind of reference product of type
String).
We have some very big issues with our SolrCloud cluster. It crashes
entirely very frequently at the indexation time. It starts with an overseer
issue :

Session expired de l’overseer : KeeperErrorCode = Session expired for
/overseer_elect/leader

Then an another node is elected overseer. But the recovery phase seems to
failed indefinitely. It seems that the communication between the overseer
and ZK is impossible.
And after a short period of time, all the cluster is unavailable (out of
memory JVM error). And we have to restart it.

So I wanted to know if we can continue to use huge multivalued field with
SolrCloud.
We are on Solr 4.10.4 for now, do you think that if we upgrade to Solr 5,
with an overseer per collection it can fix our issues ?
Or do we have to rethink the schema to avoid this very large multivalued
field ?

Thanks,
Best,

Olivier


SOLR cloud (5.2.1) recovery

2015-08-18 Thread Olivier Damiot
hello,

i'am a bit confused about how solr cloud recovery is supposed to work
exactly in the case of loosing a single node completely.

My 600 collections are created with
numShards=3replicationFactor=3maxShardsPerNode=3

However, how do i configure a new node to take the place of the dead
node, or if accidentally i delete the data dir ?

I bring up a new node which is completely empty (empty data dir),
install solr, and connect it to zookeeper.Is it supposed to work
automatically from there? All my shards/replicas on this node as down
(i suppose because there is no cores in data dir).

Do I need to recreate the cores first?

Can i copy/paste data directory from another node to this one ? I
think no because i should rename all variables in core.properties
which are dedicated for each node (like name or coreNodeName)

thanks,

Olivier Damiot


Re: Large number of collections in SolrCloud

2015-08-03 Thread Olivier
Hi,

Thanks a lot Erick and Shawn for your answers.
I am aware that it is a very particular issue with not a common use of
Solr. I just wondered if people had the similar business case. For
information we need a very important number of collections with the same
configuration cause of legally reasons. Indeed each collection represents
one of our customers and by contract we have to separate the data of each
of them.
If we had the choice, we just would have one collection with a field name
'Customers' and we would do filter queries on it but we can't !

Anyway thanks again for your answers. For now, we finally did not add the
different languages dictionaries per collection and it is fine for 1K+
customers with more resources added to the servers.

Best,

Olivier Tavard



2015-07-27 17:53 GMT+02:00 Shawn Heisey apa...@elyograg.org:

 On 7/27/2015 9:16 AM, Olivier wrote:
  I have a SolrCloud cluster with 3 nodes :  3 shards per node and
  replication factor at 3.
  The collections number is around 1000. All the collections use the same
  Zookeeper configuration.
  So when I create each collection, the ZK configuration is pulled from ZK
  and the configuration files are stored in the JVM.
  I thought that if the configuration was the same for each collection, the
  impact on the JVM would be insignifiant because the configuration should
 be
  loaded only once. But it is not the case, for each collection created,
 the
  JVM size increases because the configuration is loaded again, am I
 correct ?
 
  If I have a small configuration folder size, I have no problem because
 the
  folder size is less than 500 KB so if we count 1000 collections x 500 KB,
  the JVM impact is 500 MB.
  But we manage a lot of languages with some dictionaries so the
  configuration folder size is about 6 MB. The JVM impact is very important
  now because it can be more than 6 GB (1000 x 6 MB).
 
  So I would like to have the feeback of people who have a cluster with a
  large number of collections too. Do I have to change some settings to
  handle this case better ? What can I do to optimize this behaviour ?
  For now, we just increase the RAM size per node at 16 GB but we plan to
  increase the collections number.

 Severe issues were noticed when dealing with many collections, and this
 was with a simple config, and completely empty indexes.  A complex
 config and actual index data would make it run that much more slowly.

 https://issues.apache.org/jira/browse/SOLR-7191

 Memory usage for the config wasn't even considered when I was working on
 reporting that issue.

 SolrCloud is highly optimized to work well when there are a relatively
 small number of collections.  I think there is work that we can do which
 will optimize operations to the point where thousands of collections
 will work well, especially if they all share the same config/schema ...
 but this is likely to be a fair amount of work, which will only benefit
 a handful of users who are pushing the boundaries of what Solr can do.
 In the open source world, a problem like that doesn't normally receive a
 lot of developer attention, and we rely much more on help from the
 community, specifically from knowledgeable users who are having the
 problem and know enough to try and fix it.

 FYI -- 16GB of RAM per machine is quite small for Solr, particularly
 when pushing the envelope.  My Solr machines are maxed at 64GB, and I
 frequently wish I could install more.

 https://wiki.apache.org/solr/SolrPerformanceProblems#RAM

 One possible solution for your dilemma is simply adding more machines
 and spreading your collections out so each machine's memory requirements
 go down.

 Thanks,
 Shawn




reload collections timeout

2015-08-03 Thread olivier

Hi everybody,

I have about 1300 collections, 3 shards, replicationfactor = 3, 
MaxShardPerNode=3.
I have 3 boxes of 64G (32 JVM).

When I want to reload all my collections I get a timeout error.
Is there a way to make a reload in async as to create collections 
(async=requestid)?
I saw on this issue that it was done but it did not seem to work.

https://issues.apache.org/jira/browse/SOLR-5477

how to use the async mode to reload collections ?

thanks a lot

Olivier Damiot



Re: Fast autocomplete for large dataset

2015-08-01 Thread Olivier Austina
Thank you Eric for your reply.
If I understand it seems that these approaches are using index to hold
terms. As the index grows bigger, it can be a performance issues.
Is it right? Please can you check this article
http://www.norconex.com/serving-autocomplete-suggestions-fast/ to see
what I mean?   Thank you.

Regards
Olivier


2015-08-01 17:42 GMT+02:00 Erick Erickson erickerick...@gmail.com:

 Well, defining what you mean by autocomplete would be a start. If it's
 just
 a user types some letters and you suggest the next N terms in the list,
 TermsComponent will fix you right up.

 If it's more complicated, the AutoSuggest functionality might help.

 If it's correcting spelling, there's the spellchecker.

 Best,
 Erick

 On Sat, Aug 1, 2015 at 10:00 AM, Olivier Austina
 olivier.aust...@gmail.com wrote:
  Hi,
 
  I am looking for a fast and easy to maintain way to do autocomplete for
  large dataset in solr. I heard about Ternary Search Tree (TST)
  https://en.wikipedia.org/wiki/Ternary_search_tree.
  But I would like to know if there is something I missed such as best
  practice, Solr new feature. Any suggestion is welcome. Thank you.
 
  Regards
  Olivier



Re: Fast autocomplete for large dataset

2015-08-01 Thread Olivier Austina
Thank you Eric,

I would like to implement an autocomplete for large dataset.  The
autocomplete should show the phrase or the question the user want as the
user types. The requirement is that the autocomplete should be fast (not
slowdown by the volume of data as dataset become bigger), and easy to
maintain. The autocomplete can have its own Solr server.  It is an
autocomplete like others but it should be only fast and easy to maintain.

What is the limitations of suggesters mentioned in the article? Thank you.

Regards
Olivier


2015-08-01 19:41 GMT+02:00 Erick Erickson erickerick...@gmail.com:

 Not really. There's no need to use ngrams as the article suggests if the
 terms component does what you need. Which is why I asked you about what
 autocomplete means in your context. Which you have not clarified. Have you
 even looked at terms component?  Especially the terms.prefix option?

 Terms component has it's limitations, but performance isn't one of them.
 The suggesters mentioned in the article have other limitations. It's really
 useless to discuss those limitations, though, until the problem you're
 trying to solve is clearly stated.
 On Aug 1, 2015 1:01 PM, Olivier Austina olivier.aust...@gmail.com
 wrote:

  Thank you Eric for your reply.
  If I understand it seems that these approaches are using index to hold
  terms. As the index grows bigger, it can be a performance issues.
  Is it right? Please can you check this article
  http://www.norconex.com/serving-autocomplete-suggestions-fast/ to see
  what I mean?   Thank you.
 
  Regards
  Olivier
 
 
  2015-08-01 17:42 GMT+02:00 Erick Erickson erickerick...@gmail.com:
 
   Well, defining what you mean by autocomplete would be a start. If
 it's
   just
   a user types some letters and you suggest the next N terms in the list,
   TermsComponent will fix you right up.
  
   If it's more complicated, the AutoSuggest functionality might help.
  
   If it's correcting spelling, there's the spellchecker.
  
   Best,
   Erick
  
   On Sat, Aug 1, 2015 at 10:00 AM, Olivier Austina
   olivier.aust...@gmail.com wrote:
Hi,
   
I am looking for a fast and easy to maintain way to do autocomplete
 for
large dataset in solr. I heard about Ternary Search Tree (TST)
https://en.wikipedia.org/wiki/Ternary_search_tree.
But I would like to know if there is something I missed such as best
practice, Solr new feature. Any suggestion is welcome. Thank you.
   
Regards
Olivier
  
 



Fast autocomplete for large dataset

2015-08-01 Thread Olivier Austina
Hi,

I am looking for a fast and easy to maintain way to do autocomplete for
large dataset in solr. I heard about Ternary Search Tree (TST)
https://en.wikipedia.org/wiki/Ternary_search_tree.
But I would like to know if there is something I missed such as best
practice, Solr new feature. Any suggestion is welcome. Thank you.

Regards
Olivier


Re: Fast autocomplete for large dataset

2015-08-01 Thread Olivier Austina
Thank you Eric for your replies and the link.

Regards
Olivier


2015-08-02 3:47 GMT+02:00 Erick Erickson erickerick...@gmail.com:

 Here's some background:

 http://lucidworks.com/blog/solr-suggester/

 Basically, the limitation is that to build the suggester all docs in
 the index need to be read to pull out the stored field and build
 either the FST or the sidecar Lucene index, which can be a _very_
 costly operation (as in minutes/hours for a large dataset).

 bq: The requirement is that the autocomplete should be fast (not
 slowdown by the volume of data as dataset become bigger)

 Well, in some alternate universe this may be possible. But the larger
 the corpus the slower the processing will be, there's just no way
 around that. Whether it's fast enough for your application is a better
 question ;).

 Best,
 Erick


 On Sat, Aug 1, 2015 at 2:05 PM, Olivier Austina
 olivier.aust...@gmail.com wrote:
  Thank you Eric,
 
  I would like to implement an autocomplete for large dataset.  The
  autocomplete should show the phrase or the question the user want as the
  user types. The requirement is that the autocomplete should be fast (not
  slowdown by the volume of data as dataset become bigger), and easy to
  maintain. The autocomplete can have its own Solr server.  It is an
  autocomplete like others but it should be only fast and easy to maintain.
 
  What is the limitations of suggesters mentioned in the article? Thank
 you.
 
  Regards
  Olivier
 
 
  2015-08-01 19:41 GMT+02:00 Erick Erickson erickerick...@gmail.com:
 
  Not really. There's no need to use ngrams as the article suggests if the
  terms component does what you need. Which is why I asked you about what
  autocomplete means in your context. Which you have not clarified. Have
 you
  even looked at terms component?  Especially the terms.prefix option?
 
  Terms component has it's limitations, but performance isn't one of them.
  The suggesters mentioned in the article have other limitations. It's
 really
  useless to discuss those limitations, though, until the problem you're
  trying to solve is clearly stated.
  On Aug 1, 2015 1:01 PM, Olivier Austina olivier.aust...@gmail.com
  wrote:
 
   Thank you Eric for your reply.
   If I understand it seems that these approaches are using index to hold
   terms. As the index grows bigger, it can be a performance issues.
   Is it right? Please can you check this article
   http://www.norconex.com/serving-autocomplete-suggestions-fast/ to
 see
   what I mean?   Thank you.
  
   Regards
   Olivier
  
  
   2015-08-01 17:42 GMT+02:00 Erick Erickson erickerick...@gmail.com:
  
Well, defining what you mean by autocomplete would be a start. If
  it's
just
a user types some letters and you suggest the next N terms in the
 list,
TermsComponent will fix you right up.
   
If it's more complicated, the AutoSuggest functionality might help.
   
If it's correcting spelling, there's the spellchecker.
   
Best,
Erick
   
On Sat, Aug 1, 2015 at 10:00 AM, Olivier Austina
olivier.aust...@gmail.com wrote:
 Hi,

 I am looking for a fast and easy to maintain way to do
 autocomplete
  for
 large dataset in solr. I heard about Ternary Search Tree (TST)
 https://en.wikipedia.org/wiki/Ternary_search_tree.
 But I would like to know if there is something I missed such as
 best
 practice, Solr new feature. Any suggestion is welcome. Thank you.

 Regards
 Olivier
   
  
 



Leader election

2015-07-29 Thread Olivier Damiot
Hello everybody,

I use solr 5.2.1 and am having a big problem.
I have about 1200 collections, 3 shards, replicationfactor = 3,
MaxShardPerNode=3.
I have 3 boxes of 64G (32 JVM).
I have no problems with the creation of collection or indexing, but when I
lose a node (VMY full or kill) and I restart, all my collections are down.
I look in the logs I can see problems of leader election, eg:
  - Checking if I (core = test339_shard1_replica1, coreNodeName =
core_node5) shoulds try and be the leader.
- Cloud says we are still state leader.

I feel that all server pass the buck!

I do not understand this error especially as if I read the mailing list I
have the impression that this bug is solved long ago.

what should I do to start my collections properly?

Is someone could help me ?

thank you a lot

Olivier


Large number of collections in SolrCloud

2015-07-27 Thread Olivier
Hi,

I have a SolrCloud cluster with 3 nodes :  3 shards per node and
replication factor at 3.
The collections number is around 1000. All the collections use the same
Zookeeper configuration.
So when I create each collection, the ZK configuration is pulled from ZK
and the configuration files are stored in the JVM.
I thought that if the configuration was the same for each collection, the
impact on the JVM would be insignifiant because the configuration should be
loaded only once. But it is not the case, for each collection created, the
JVM size increases because the configuration is loaded again, am I correct ?

If I have a small configuration folder size, I have no problem because the
folder size is less than 500 KB so if we count 1000 collections x 500 KB,
the JVM impact is 500 MB.
But we manage a lot of languages with some dictionaries so the
configuration folder size is about 6 MB. The JVM impact is very important
now because it can be more than 6 GB (1000 x 6 MB).

So I would like to have the feeback of people who have a cluster with a
large number of collections too. Do I have to change some settings to
handle this case better ? What can I do to optimize this behaviour ?
For now, we just increase the RAM size per node at 16 GB but we plan to
increase the collections number.

Thanks,

Olivier


Re: Dereferencing boost values?

2015-07-14 Thread Olivier Lebra

Thanks guys...
I'm using edismax, and I have a long bf field, that I want in a solr's 
requesthandler config as default, but customizable via query string, 
something like that:

requestHandler
  lst name=defaults
str name=bfproduct(a,$a)^$fa sum(b,$b1,$b2)^$fb c^$fc .../str

where the caller would pass $a, $fa, $b1, $b2, $fb, $fc (and a, b, c are 
numeric fields)


So my problem is with $fa, $fb, and $fc. Solr doesn't take that syntax.

For numeric operands, is the dismax boost operator ^ just a pow()? If 
so, my problem is solved by doing that:
 str name=bfpow(product(a,$a1),$fa) pow(sum(b,$b1,$b2),$fb) 
pow(c,$fc)/str

Is a^b equiv to pow(a,b)?

Thanks,
Olivier


On 7/14/2015 2:31 PM, Chris Hostetter wrote:

To clarify the difference:

- bf is a special param of the dismax parser, which does an *additive*
boost function - that function can be something as simple as a numeric
field

- alternatively, you can use the boost parser in your main query string,
to wrap any parser (dismax, edismax, standard, whatever) in a
*multiplicitive* boost, where the boost function can be anything

- multiplicitve boosts are almost always what people really want, additive
boosts are a lot less useful.

- when specifying any function, you can use variable derefrencing for any
function params.

So in the example Upayavira gave, you can use any arbitrary query param to
specify the function to use as a multiplicitive boost arround an arbitrary
query -- which could still use dismax if you want (just specify the
neccessary parser type as a localparam on the inner query, or use a
defType localparam on the original boost query).  Or you could explicitly
specify a function that incorporates a field value with some other
dynamic params, and use that entire function as your multiplicitive boost.

a more elaborate example using the bin/solr -e techproducts data...

http://localhost:8983/solr/techproducts/query?debug=queryq={!boost%20b=$boost_func%20defType=dismax%20v=$qq}qf=name+titleqq=apple%20ipodboost_func=pow%28$boost_field,$boost_factor%29boost_field=priceboost_factor=2

 params:{
   qq:apple ipod,
   q:{!boost b=$boost_func defType=dismax v=$qq},
   debug:query,
   qf:name title,
   boost_func:pow($boost_field,$boost_factor),
   boost_factor:2,
   boost_field:price}},







: Date: Tue, 14 Jul 2015 21:58:36 +0100
: From: Upayavira u...@odoko.co.uk
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: Dereferencing boost values?
:
: You could do
:
: q={!boost b=$b v=$qq}
: qq=your query
: b=YOUR-FACTOR
:
: If what you want is to provide a value outside.
:
: Also, with later Solrs, you can use ${whatever} syntax in your main
: query, which might work for you too.
:
: Upayavira
:
: On Tue, Jul 14, 2015, at 09:28 PM, Olivier Lebra wrote:
:  Is there a way to do something like this:  bf=myfield^$myfactor  ?
:  (Doesn't work, the boost value has to be a direct number)
: 
:  Thanks,
:  Olivier
:

-Hoss
http://www.lucidworks.com/




Dereferencing boost values?

2015-07-14 Thread Olivier Lebra
Is there a way to do something like this:  bf=myfield^$myfactor  ?
(Doesn't work, the boost value has to be a direct number)

Thanks,
Olivier


How to dereference boost values?

2015-07-14 Thread Olivier Lebra
Is it possible to do something like this: bf=myfield^$myfactor

Thanks,
Olivier


Re: How to implement Auto complete, suggestion client side

2015-01-28 Thread Olivier Austina
Hi,

Thank you Dan Davis and Alexandre Rafalovitch. This is very helpful for me.

Regards
Olivier


2015-01-27 0:51 GMT+01:00 Alexandre Rafalovitch arafa...@gmail.com:

 You've got a lot of options depending on what you want. But since you
 seem to just want _an_ example, you can use mine from
 http://www.solr-start.com/javadoc/solr-lucene/index.html (gray search
 box there).

 You can see the source for the test screen (using Spring Boot and
 Spring Data Solr as a middle-layer) and Select2 for the UI at:
 https://github.com/arafalov/Solr-Javadoc/tree/master/SearchServer.
 The Solr definition is at:

 https://github.com/arafalov/Solr-Javadoc/tree/master/JavadocIndex/JavadocCollection/conf

 Other implementation pieces are in that (and another) public
 repository as well, but it's all in Java. You'll probably want to do
 something similar in PHP.

 Regards,
Alex.
 
 Sign up for my Solr resources newsletter at http://www.solr-start.com/


 On 26 January 2015 at 17:11, Olivier Austina olivier.aust...@gmail.com
 wrote:
  Hi All,
 
  I would say I am new to web technology.
 
  I would like to implement auto complete/suggestion in the user search box
  as the user type in the search box (like Google for example). I am using
  Solr as database. Basically I am  familiar with Solr and I can formulate
  suggestion queries.
 
  But now I don't know how to implement suggestion in the User Interface.
  Which technologies should I need. The website is in PHP. Any suggestions,
  examples, basic tutorial is welcome. Thank you.
 
 
 
  Regards
  Olivier



How to implement Auto complete, suggestion client side

2015-01-26 Thread Olivier Austina
Hi All,

I would say I am new to web technology.

I would like to implement auto complete/suggestion in the user search box
as the user type in the search box (like Google for example). I am using
Solr as database. Basically I am  familiar with Solr and I can formulate
suggestion queries.

But now I don't know how to implement suggestion in the User Interface.
Which technologies should I need. The website is in PHP. Any suggestions,
examples, basic tutorial is welcome. Thank you.



Regards
Olivier


Architecture for PHP web site, Solr and an application

2014-12-26 Thread Olivier Austina
Hi,

I would like to query only some fields in Solr depend on the user input as
I know the fields.

The user send an HTML form to the PHP website. The application get the
fields and their content from the PHP web site. The application then
formulate a query to Solr based on this fields and other contextual
information. Only fields from the HTML form are used. The forms don't have
the same fields. The application is not yet developed. It could be in C++,
Java or other language using a database. It uses more resources.

I am wondering which architecture is suitable for this case:
-How to make the architecture scalable (to support more users)
-How to make PHP communicate with the application if this application is
not in PHP.

Any suggestion is welcome. Thank you.

 Regards
Olivier


UI for Solr

2014-12-23 Thread Olivier Austina
Hi,

I would like to build a User Interface on top of Solr for PC and mobile. I
am wondering if there is a framework, best practice commonly used. I want
Solr features such as suggestion, auto complete, facet to be available for
UI. Any suggestion is welcome. Than you.

Regards
Olivier


Re: UI for Solr

2014-12-23 Thread Olivier Austina
Hi Alex,

Thank you for prompt reply. I am not aware of Spring.io's Spring Data Solr.

Regards
Olivier


2014-12-23 16:50 GMT+01:00 Alexandre Rafalovitch arafa...@gmail.com:

 You don't expose Solr directly to the user, it is not setup for
 full-proof security out of the box. So you would need a client to talk
 to Solr.

 Something like Spring.io's Spring Data Solr could be one of the things
 to check. You can see an auto-complete example for it at:
 https://github.com/arafalov/Solr-Javadoc/tree/master/SearchServer/src/main
 and embedded in action at
 http://www.solr-start.com/javadoc/solr-lucene/index.html (search box
 on the top)

 Regards,
Alex.
 
 Sign up for my Solr resources newsletter at http://www.solr-start.com/


 On 23 December 2014 at 10:45, Olivier Austina olivier.aust...@gmail.com
 wrote:
  Hi,
 
  I would like to build a User Interface on top of Solr for PC and mobile.
 I
  am wondering if there is a framework, best practice commonly used. I want
  Solr features such as suggestion, auto complete, facet to be available
 for
  UI. Any suggestion is welcome. Than you.
 
  Regards
  Olivier



Re: Indexing documents/files for production use

2014-10-30 Thread Olivier Austina
Thank you Alexandre, Jürgen and Erick for your replies. It is clear for me.

Regards
Olivier


2014-10-28 23:35 GMT+01:00 Erick Erickson erickerick...@gmail.com:

 And one other consideration in addition to the two excellent responses
 so far

 In a SolrCloud environment, SolrJ via CloudSolrServer will automatically
 route the documents to the correct shard leader, saving some additional
 overhead. Post.jar and cURL send the docs to a node, which in turn
 forward the docs to the correct shard leader which lowers
 throughput

 Best,
 Erick

 On Tue, Oct 28, 2014 at 2:32 PM, Jürgen Wagner (DVT)
 juergen.wag...@devoteam.com wrote:
  Hello Olivier,
for real production use, you won't really want to use any toys like
  post.jar or curl. You want a decent connector to whatever data source
 there
  is, that fetches data, possibly massages it a bit, and then feeds it into
  Solr - by means of SolrJ or directly into the web service of Solr via
 binary
  protocols. This way, you can properly handle incremental feeding,
 processing
  of data from remote locations (with the connector being closer to the
 data
  source), and also source data security. Also think about what happens if
 you
  do processing of incoming documents in Solr. What happens if Tika runs
 out
  of memory because of PDF problems? What if this crashes your Solr node?
 In
  our Solr projects, we generally do not do any sizable processing within
 Solr
  as document processing and document indexing or querying have all
 different
  scaling properties.
 
  Production use most typically is not achieved by deploying a vanilla
 Solr,
  but rather having a bit more glue and wrappage, so the whole will fit
 your
  requirements in terms of functionality, scaling, monitoring and
 robustness.
  Some similar platforms like Elasticsearch try to alleviate these pains of
  going to a production-style infrastructure, but that's at the expense of
  flexibility and comes with limitations.
 
  For proof-of-concept or demonstrator-style applications, the plain tools
 out
  of the box will be fine. For production applications, you want to have
 more
  robust components.
 
  Best regards,
  --Jürgen
 
 
  On 28.10.2014 22:12, Olivier Austina wrote:
 
  Hi All,
 
  I am reading the solr documentation. I have understood that post.jar
  
 http://wiki.apache.org/solr/ExtractingRequestHandler#SimplePostTool_.28post.jar.29
 
  is not meant for production use, cURL
  
 https://cwiki.apache.org/confluence/display/solr/Introduction+to+Solr+Indexing
 
  is not recommanded. Is SolrJ better for production?  Thank you.
  Regards
  Olivier
 
 
 
  --
 
  Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
  уважением
  i.A. Jürgen Wagner
  Head of Competence Center Intelligence
   Senior Cloud Consultant
 
  Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
  Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864
 1543
  E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de
 
  
  Managing Board: Jürgen Hatzipantelis (CEO)
  Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
  Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
 
 



Indexing documents/files for production use

2014-10-28 Thread Olivier Austina
Hi All,

I am reading the solr documentation. I have understood that post.jar
http://wiki.apache.org/solr/ExtractingRequestHandler#SimplePostTool_.28post.jar.29
is not meant for production use, cURL
https://cwiki.apache.org/confluence/display/solr/Introduction+to+Solr+Indexing
is not recommanded. Is SolrJ better for production?  Thank you.
Regards
Olivier


OpenExchangeRates.Org rates in solr

2014-10-26 Thread Olivier Austina
Hi,

There is a way to see the OpenExchangeRates.Org
http://www.OpenExchangeRates.Org rates used in Solr somewhere. I have
changed the configuration to use these rates. Thank you.
Regards
Olivier


Re: OpenExchangeRates.Org rates in solr

2014-10-26 Thread Olivier Austina
Hi Will,

I am learning Solr now. I can use it  later for business or for free
access. Thank you.

Regards
Olivier


2014-10-26 17:32 GMT+01:00 Will Martin wmartin...@gmail.com:

 Hi Olivier:

 Can you clarify this message? Are you using Solr at the business? Or are
 you giving free access to solr installations?

 Thanks,
 Will


 -Original Message-
 From: Olivier Austina [mailto:olivier.aust...@gmail.com]
 Sent: Sunday, October 26, 2014 10:57 AM
 To: solr-user@lucene.apache.org
 Subject: OpenExchangeRates.Org rates in solr

 Hi,

 There is a way to see the OpenExchangeRates.Org 
 http://www.OpenExchangeRates.Org rates used in Solr somewhere. I have
 changed the configuration to use these rates. Thank you.
 Regards
 Olivier




Re: Remove indexes of XML file

2014-10-25 Thread Olivier Austina
Thank you Alex, I think I can use the file to delete corresponding indexes.

Regards
Olivier


2014-10-24 21:51 GMT+02:00 Alexandre Rafalovitch arafa...@gmail.com:

 You can delete individually, all (*:* query) or by specific query. So,
 if there is no common query pattern you may need to do a multi-id
 query - something like id:(id1 id2 id3 id4) which does require you
 knowing the IDs.

 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 24 October 2014 15:44, Olivier Austina olivier.aust...@gmail.com
 wrote:
  Hi,
 
  This is newbie question. I have indexed some documents using some XML
 files
  as indicating in the tutorial
  http://lucene.apache.org/solr/4_10_1/tutorial.html with the command :
 
  java -jar post.jar *.xml
 
  I have seen how to delete an index for one document but how to delete
  all indexes
  for documents within an XML file. For example if I have indexed some
  files A, B, C, D etc.,
  how to delete indexes of documents from file C. Is there a command
  like above or other
  solution without using individual ID? Thank you.
 
 
  Regards
  Olivier



Remove indexes of XML file

2014-10-24 Thread Olivier Austina
Hi,

This is newbie question. I have indexed some documents using some XML files
as indicating in the tutorial
http://lucene.apache.org/solr/4_10_1/tutorial.html with the command :

java -jar post.jar *.xml

I have seen how to delete an index for one document but how to delete
all indexes
for documents within an XML file. For example if I have indexed some
files A, B, C, D etc.,
how to delete indexes of documents from file C. Is there a command
like above or other
solution without using individual ID? Thank you.


Regards
Olivier


Re: Problems for indexing large documents on SolrCloud

2014-09-22 Thread Olivier
Hi,

First thanks for your advices.
I did some several tests and finally I could index all the data on my
SolrCloud cluster.
The error was client side, it's documented in this post :
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201406.mbox/%3ccfc09ae1.94f8%25rebecca.t...@ucsf.edu%3E

EofException from Jetty means one specific thing:  The client software
disconnected before Solr was finished with the request and sent its
response.  Chances are good that this is because of a configured socket
timeout on your SolrJ client or its HttpClient.  This might have been
done with the setSoTimeout method on the server object.

So I increased Solarium timeout from 5 to 60 seconds and all the data
is well indexed now. The error was not reproducible on my development
PC because the database and the Solr were on the same local virtual
machine with a lot of available resources so the indexation was faster
than in SolrCloud cluster.

Thanks,

Olivier


2014-09-11 0:21 GMT+02:00 Shawn Heisey s...@elyograg.org:

 On 9/10/2014 2:05 PM, Erick Erickson wrote:
  bq: org.apache.solr.common.SolrException: Unexpected end of input
  block; expected an identifier
 
  This is very often an indication that your packets are being
  truncated by something in the chain. In your case, make sure
  that Tomcat is configured to handle inputs of the size that you're
 sending.
 
  This may be happening before things get to Solr, in which case your
 settings
  in solrconfig.xml aren't germane, the problem is earlier than than.
 
  A semi-smoking-gun here is that there's a size of your multivalued
  field that seems to break things... That doesn't rule out time problems
  of course.
 
  But I'd look at the Tomcat settings for maximum packet size first.

 The maximum HTTP request size is actually is controlled by Solr itself
 since 4.1, with changes committed for SOLR-4265.  Changing the setting
 on Tomcat probably will not help.

 An example from my own config which sets this to 32MB - the default is
 2048, or 2MB:

  requestParsers enableRemoteStreaming=false
 multipartUploadLimitInKB=32768 formdataUploadLimitInKB=32768/

 Thanks,
 Shawn




Problems for indexing large documents on SolrCloud

2014-09-10 Thread Olivier
Hi,

I have some problems for indexing large documents in a SolrCloud cluster of
3 servers  (Solr 4.8.1) with 3 shards and 2 replicas for each shard on
Tomcat 7.
For a specific document (with 300 K values in a  multivalued field), I
couldn't index it on SolrCloud but I could do it in a single instance of
Solr on my own PC.

The indexation is done with Solarium from a database. The data indexed are
e-commerce products with classic fields like name, price, description,
instock, etc... The large field (type int) is constitued of other products
ids.
The only difference with other documents well-indexed on Solr  is the size
of that multivalued field. Indeed, other documents well-indexed have all
between 100K values and 200 K values for that field.
The index size is 11 Mb for 20 documents.

To solve it, I tried to change several parameters including ZKTimeout in
solr.xml  :

In solrcloud section :

int name=zkClientTimeout6/int

int name=distribUpdateConnTimeout10/int

int name=distribUpdateSoTimeout10/int



 In shardHandlerFactory section  :



int name=socketTimeout${socketTimeout:10}/int

int name=connTimeout${connTimeout:10}/int


I also tried to increase these values in solrconfig.xml :

requestParsers enableRemoteStreaming=true

multipartUploadLimitInKB=1

formdataUploadLimitInKB=10

addHttpRequestToContext=false/




I also tried to increase the quantity of RAM (there are VMs) : each server
has 4 Gb of RAM with 3Gb for the JVM.

Are there other settings which can solve the problem that I would have
forgotten ?


The error messages are :

ERROR

SolrDispatchFilter

null:java.lang.RuntimeException: [was class java.net.SocketException]
Connection reset

ERROR

SolrDispatchFilter

null:ClientAbortException:

java.net.SocketException:
broken pipe

ERROR

SolrDispatchFilter

null:ClientAbortException:

java.net.SocketException:
broken pipe

ERROR

SolrCore

org.apache.solr.common.SolrException:
  Unexpected end of input
block; expected an identifier

ERROR

SolrCore

org.apache.solr.common.SolrException:
  Unexpected end of input
block; expected an identifier

ERROR

SolrCore

org.apache.solr.common.SolrException:
  Unexpected end of input
block; expected an identifier

ERROR

SolrCore

org.apache.solr.common.SolrException:
  Unexpected EOF in
attribute value








Thanks,

Olivier

SolrCore

org.apache.solr.common.SolrException:
  Unexpected end of input
block in start tag


Subject=How to Get Highlighting Working in Velocity (Solr 4.8.0)

2014-07-27 Thread Olivier FOSTIER
May be you miss that your field dom_title should be
index=true termVectors=true termPositions=true termOffsets=true


Website running Solr

2014-05-11 Thread Olivier Austina
Hi All,
Is there a way to know if a website use Solr? Thanks.
Regards
Olivier


Problem indexing email attachments

2014-04-23 Thread Olivier . Masseau
Hello, 

I'm trying to index email files with Solr (4.7.2)

The files have the extension .eml (message/rfc822) 

The mail body is correctly indexed but attachments are not indexed if they 
are not .txt files. 

If attachments are .txt files it works, but if attachment are .pdf of 
.docx files they are not indexed. 



I checked the extracted text by calling: 

curl 
http://localhost:8983/solr/update/extract?literal.id=doc1commit=trueextractOnly=trueextractFormat=text
 -F myfile=@Test1.eml 

The returned extracted text does not contain the content of the 
attachments if they are not .txt files. 


It is not a problem with the Apache Tika library not being able to process 
attachments, because running the standalone Apache Tika app by calling: 


java -jar tika-app-1.4.jar -t Test1.eml 


on my eml files correctly displays the attachments' text. 



Maybe is it a problem with how Tika is called by Solr ? 

Is there something to modify in the default configuration ? 


Thanx for any help ;) 
 
Olivier 

Re: Problem indexing email attachments

2014-04-23 Thread Olivier . Masseau
As I said, it is not a problem in the Tika library ;)

I have tried with Tika 1.5 jars and it gives the same results.



Guido Medina guido.med...@temetra.com wrote on 23/04/2014 16:15:11:

 From: Guido Medina guido.med...@temetra.com
 To: solr-user@lucene.apache.org
 Date: 23/04/2014 16:15
 Subject: Re: Problem indexing email attachments
 
 We particularly massage solr.war and put our own updated jars, maybe 
 this helps:
 
 http://www.apache.org/dist/tika/CHANGES-1.5.txt
 
 We using Tika 1.5 inside Solr with POI 3.10-Final, etc...
 
 Guido.
 
 On 23/04/14 14:38, olivier.mass...@real.lu wrote:
  Hello,
 
  I'm trying to index email files with Solr (4.7.2)
 
  The files have the extension .eml (message/rfc822)
 
  The mail body is correctly indexed but attachments are not indexed if 
they
  are not .txt files.
 
  If attachments are .txt files it works, but if attachment are .pdf of
  .docx files they are not indexed.
 
 
 
  I checked the extracted text by calling:
 
  curl 
  http://localhost:8983/solr/update/extract?
 literal.id=doc1commit=trueextractOnly=trueextractFormat=text
   -F myfile=@Test1.eml
 
  The returned extracted text does not contain the content of the
  attachments if they are not .txt files.
 
 
  It is not a problem with the Apache Tika library not being able to 
process
  attachments, because running the standalone Apache Tika app by 
calling:
 
 
  java -jar tika-app-1.4.jar -t Test1.eml
 
 
  on my eml files correctly displays the attachments' text.
 
 
 
  Maybe is it a problem with how Tika is called by Solr ?
 
  Is there something to modify in the default configuration ?
 
 
  Thanx for any help ;)
  
  Olivier
 


Topology of Solr use

2014-04-17 Thread Olivier Austina
Hi All,
I would to have an idea about Solr usage: number of users, industry,
countries or any helpful information. Thank you.
Regards
Olivier


Re: Topology of Solr use

2014-04-17 Thread Olivier Austina
Thank you Markus, the link is very useful.


Regards
Olivier



2014-04-17 18:24 GMT+02:00 Markus Jelsma markus.jel...@openindex.io:

 This may help a bit:

 https://wiki.apache.org/solr/PublicServers

 -Original message-
 From:Olivier Austina olivier.aust...@gmail.com
 Sent:Thu 17-04-2014 18:16
 Subject:Topology of Solr use
 To:solr-user@lucene.apache.org;
 Hi All,
 I would to have an idea about Solr usage: number of users, industry,
 countries or any helpful information. Thank you.
 Regards
 Olivier



Querying specific database attributes or table

2014-03-16 Thread Olivier Austina
Hi,
I am new to Solr.

I would like to index and querying a relational database. Is it possible to
query a specific table or attribute of the database. Example if I have 2
tables A and B both have the attribute name and I want to have only the
results form the table A and not from table B. Is it possible?
Can I restrict the query to only one table without having result from
others table?
Is it possible to query a specific attribute of a table?
Is it possible to do join query like SQL?
Any suggestion is welcome. Thank you.

Regards
Olivier


Re: feedback on Solr 4.x LotsOfCores feature

2013-10-22 Thread Soyez Olivier
Another way to simulate the core discovery is :
time find $PATH_TO_CORES -name core.properties -type f -exec cat '{}'  
/dev/null 21 \;

or just the core.properties read time  :
find $PATH_TO_CORES -name core.properties  cores.list
time for i in `cat cores.list`; do cat $i  /dev/null 21; done;

Olivier

Le 19/10/2013 11:57, Erick Erickson a écrit :

For my quick-and-dirty test I just rebooted my machine totally and still
had 1K/sec core discovery. So this still puzzles me greatly. The time
do do this should be approximated by the time it takes to just walk
your tree, find all the core.properties and read them. I it possible to
just write a tiny Java program to do that? Or rip off the core discovery
code and use that for a small stand-alone program? Because this is quite
a bit at odds with what I've seen. Although now that I think about it,
the code has gone through some revisions since then, but I don't think
they should have affected this...

Best
Erick


On Fri, Oct 18, 2013 at 2:59 PM, Soyez Olivier
olivier.so...@worldline.commailto:olivier.so...@worldline.comwrote:

 15K cores is around 4 minutes : no network drive, just a spinning disk
 But, one important thing, to simulate a cold start or an useless linux
 buffer cache,
 I used the following command to empty the linux buffer cache :
 sync  echo 3  /proc/sys/vm/drop_caches
 Then, I started Solr and I found the result above


 Le 11/10/2013 13:06, Erick Erickson a écrit :


 bq: sharing the underlying solrconfig object the configset introduced
 in JIRA SOLR-4478 seems to be the solution for non-SolrCloud mode

 SOLR-4478 will NOT share the underlying config objects, it simply
 shares the underlying directory. Each core will, at least as presently
 envisioned, simply read the files that exist there and create their
 own solrconfig object. Schema objects may be shared, but not config
 objects. It may turn out to be relatively easy to do in the configset
 situation, but last time I looked at sharing the underlying config
 object it was too fraught with problems.

 bq: 15K cores is around 4 minutes

 I find this very odd. On my laptop, spinning disk, I think I was
 seeing 1k cores discovered/sec. You're seeing roughly 16x slower, so I
 have no idea what's going on here. If this is just reading the files,
 you should be seeing horrible disk contention. Are you on some kind of
 networked drive?

 bq: To do that in background and to block on that request until core
 discovery is complete, should not work for us (due to the worst case).
 What other choices are there? Either you have to do it up front or
 with some kind of blocking. Hmmm, I suppose you could keep some kind
 of custom store (DB? File? ZooKeeper?) that would keep the last known
 layout. You'd still have some kind of worst-case situation where the
 core you were trying to load wouldn't be in your persistent store and
 you'd _still_ have to wait for the discovery process to complete.

 bq: and we will use the cores Auto option to create load or only load
 the core on
 Interesting. I can see how this could all work without any core
 discovery but it does require a very specific setup.

 On Thu, Oct 10, 2013 at 11:42 AM, Soyez Olivier
 olivier.so...@worldline.commailto:olivier.so...@worldline.commailto:olivier.so...@worldline.com
  wrote:
  The corresponding patch for Solr 4.2.1 LotsOfCores can be found in
 SOLR-5316, including the new Cores options :
  - numBuckets to create a subdirectory based on a hash on the corename
 % numBuckets in the core Datadir
  - Auto with 3 differents values :
1) false : default behaviour
2) createLoad : create, if not exist, and load the core on the fly on
 the first incoming request (update, select)
3) onlyLoad : load the core on the fly on the first incoming request
 (update, select), if exist on disk
 
  Concerning :
  - sharing the underlying solrconfig object, the configset introduced in
 JIRA SOLR-4478 seems to be the solution for non-SolrCloud mode.
  We need to test it for our use case. If another solution exists, please
 tell me. We are very interested in such functionality and to contribute, if
 we can.
 
  - the possibility of lotsOfCores in SolrCloud, we don't know in details
 how SolrCloud is working.
  But one possible limit is the maximum number of entries that can be
 added to a zookeeper node.
  Maybe, a solution will be just a kind of hashing in the zookeeper tree.
 
  - the time to discover cores in Solr 4.4 : with spinning disk under
 linux, all cores with transient=true and loadOnStartup=false, the linux
 buffer cache empty before starting Solr :
  15K cores is around 4 minutes. It's linear in the cores number, so for
 50K it's more than 13 minutes. In fact, it corresponding to the time to
 read all core.properties files.
  To do that in background and to block on that request until core
 discovery is complete, should not work for us (due to the worst case).
  So, we will just disable the core Discovery, because we don't need to
 know

Re: feedback on Solr 4.x LotsOfCores feature

2013-10-18 Thread Soyez Olivier
15K cores is around 4 minutes : no network drive, just a spinning disk
But, one important thing, to simulate a cold start or an useless linux buffer 
cache,
I used the following command to empty the linux buffer cache :
sync  echo 3  /proc/sys/vm/drop_caches
Then, I started Solr and I found the result above


Le 11/10/2013 13:06, Erick Erickson a écrit :


bq: sharing the underlying solrconfig object the configset introduced
in JIRA SOLR-4478 seems to be the solution for non-SolrCloud mode

SOLR-4478 will NOT share the underlying config objects, it simply
shares the underlying directory. Each core will, at least as presently
envisioned, simply read the files that exist there and create their
own solrconfig object. Schema objects may be shared, but not config
objects. It may turn out to be relatively easy to do in the configset
situation, but last time I looked at sharing the underlying config
object it was too fraught with problems.

bq: 15K cores is around 4 minutes

I find this very odd. On my laptop, spinning disk, I think I was
seeing 1k cores discovered/sec. You're seeing roughly 16x slower, so I
have no idea what's going on here. If this is just reading the files,
you should be seeing horrible disk contention. Are you on some kind of
networked drive?

bq: To do that in background and to block on that request until core
discovery is complete, should not work for us (due to the worst case).
What other choices are there? Either you have to do it up front or
with some kind of blocking. Hmmm, I suppose you could keep some kind
of custom store (DB? File? ZooKeeper?) that would keep the last known
layout. You'd still have some kind of worst-case situation where the
core you were trying to load wouldn't be in your persistent store and
you'd _still_ have to wait for the discovery process to complete.

bq: and we will use the cores Auto option to create load or only load
the core on
Interesting. I can see how this could all work without any core
discovery but it does require a very specific setup.

On Thu, Oct 10, 2013 at 11:42 AM, Soyez Olivier
olivier.so...@worldline.commailto:olivier.so...@worldline.com wrote:
 The corresponding patch for Solr 4.2.1 LotsOfCores can be found in SOLR-5316, 
 including the new Cores options :
 - numBuckets to create a subdirectory based on a hash on the corename % 
 numBuckets in the core Datadir
 - Auto with 3 differents values :
   1) false : default behaviour
   2) createLoad : create, if not exist, and load the core on the fly on the 
 first incoming request (update, select)
   3) onlyLoad : load the core on the fly on the first incoming request 
 (update, select), if exist on disk

 Concerning :
 - sharing the underlying solrconfig object, the configset introduced in JIRA 
 SOLR-4478 seems to be the solution for non-SolrCloud mode.
 We need to test it for our use case. If another solution exists, please tell 
 me. We are very interested in such functionality and to contribute, if we can.

 - the possibility of lotsOfCores in SolrCloud, we don't know in details how 
 SolrCloud is working.
 But one possible limit is the maximum number of entries that can be added to 
 a zookeeper node.
 Maybe, a solution will be just a kind of hashing in the zookeeper tree.

 - the time to discover cores in Solr 4.4 : with spinning disk under linux, 
 all cores with transient=true and loadOnStartup=false, the linux buffer 
 cache empty before starting Solr :
 15K cores is around 4 minutes. It's linear in the cores number, so for 50K 
 it's more than 13 minutes. In fact, it corresponding to the time to read all 
 core.properties files.
 To do that in background and to block on that request until core discovery is 
 complete, should not work for us (due to the worst case).
 So, we will just disable the core Discovery, because we don't need to know 
 all cores from the start. Start Solr without any core entries in solr.xml, 
 and we will use the cores Auto option to create load or only load the core on 
 the fly, based on the existence of the core on the disk (absolute path 
 calculated from the core name).

 Thanks for your interest,

 Olivier
 
 De : Erick Erickson [erickerick...@gmail.commailto:erickerick...@gmail.com]
 Date d'envoi : lundi 7 octobre 2013 14:33
 À : solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
 Objet : Re: feedback on Solr 4.x LotsOfCores feature

 Thanks for the great writeup! It's always interesting to see how
 a feature plays out in the real world. A couple of questions
 though:

 bq: We added 2 Cores options :
 Do you mean you patched Solr? If so are you willing to shard the code
 back? If both are yes, please open a JIRA, attach the patch and assign
 it to me.

 bq:  the number of file descriptors, it used a lot (need to increase global
 max and per process fd)

 Right, this makes sense since you have a bunch of cores all with their
 own descriptors open. I'm assuming that you hit a rather high max
 number

Re: Re: feedback on Solr 4.x LotsOfCores feature

2013-10-10 Thread Soyez Olivier
The corresponding patch for Solr 4.2.1 LotsOfCores can be found in SOLR-5316, 
including the new Cores options :
- numBuckets to create a subdirectory based on a hash on the corename % 
numBuckets in the core Datadir
- Auto with 3 differents values :
  1) false : default behaviour
  2) createLoad : create, if not exist, and load the core on the fly on the 
first incoming request (update, select)
  3) onlyLoad : load the core on the fly on the first incoming request (update, 
select), if exist on disk

Concerning :
- sharing the underlying solrconfig object, the configset introduced in JIRA 
SOLR-4478 seems to be the solution for non-SolrCloud mode.
We need to test it for our use case. If another solution exists, please tell 
me. We are very interested in such functionality and to contribute, if we can.

- the possibility of lotsOfCores in SolrCloud, we don't know in details how 
SolrCloud is working.
But one possible limit is the maximum number of entries that can be added to a 
zookeeper node.
Maybe, a solution will be just a kind of hashing in the zookeeper tree.

- the time to discover cores in Solr 4.4 : with spinning disk under linux, all 
cores with transient=true and loadOnStartup=false, the linux buffer cache 
empty before starting Solr :
15K cores is around 4 minutes. It's linear in the cores number, so for 50K it's 
more than 13 minutes. In fact, it corresponding to the time to read all 
core.properties files.
To do that in background and to block on that request until core discovery is 
complete, should not work for us (due to the worst case).
So, we will just disable the core Discovery, because we don't need to know all 
cores from the start. Start Solr without any core entries in solr.xml, and we 
will use the cores Auto option to create load or only load the core on the fly, 
based on the existence of the core on the disk (absolute path calculated from 
the core name).

Thanks for your interest,

Olivier

De : Erick Erickson [erickerick...@gmail.com]
Date d'envoi : lundi 7 octobre 2013 14:33
À : solr-user@lucene.apache.org
Objet : Re: feedback on Solr 4.x LotsOfCores feature

Thanks for the great writeup! It's always interesting to see how
a feature plays out in the real world. A couple of questions
though:

bq: We added 2 Cores options :
Do you mean you patched Solr? If so are you willing to shard the code
back? If both are yes, please open a JIRA, attach the patch and assign
it to me.

bq:  the number of file descriptors, it used a lot (need to increase global
max and per process fd)

Right, this makes sense since you have a bunch of cores all with their
own descriptors open. I'm assuming that you hit a rather high max
number and it stays pretty steady

bq: the overhead to parse solrconfig.xml and load dependencies to open
each core

Right, I tried to look at sharing the underlying solrconfig object but
it seemed pretty hairy. There are some extensive comments in the
JIRA of the problems I foresaw. There may be some action on this
in the future.

bq: lotsOfCores doesn’t work with SolrCloud

Right, we haven't concentrated on that, it's an interesting problem.
In particular it's not clear what happens when nodes go up/down,
replicate, resynch, all that.

bq: When you start, it spend a lot of times to discover cores due to a big

How long? I tried 15K cores on my laptop and I think I was getting 15
second delays or roughly 1K cores discovered/second. Is your delay
on the order of 50 seconds with 50K cores?

I'm not sure how you could do that in the background, but I haven't
thought about it much. I tried multi-threading core discovery and that
didn't help (SSD disk), I assumed that the problem was mostly I/O
contention (but didn't prove it). What if a request came in for a core
before you'd found it? I'm not sure what the right behavior would be
except perhaps to block on that request until core discovery was
complete. Hm. How would that work for your case? That
seems do-able.

BTW, so far you get the prize for the most cores on a node I think.

Thanks again for the great feedback!

Erick

On Mon, Oct 7, 2013 at 3:53 AM, Soyez Olivier
olivier.so...@worldline.com wrote:
 Hello,

 In my company, we use Solr in production to offer full text search on
 mailboxes.
 We host dozens million of mailboxes, but only webmail users have such
 feature (few millions).
 We have the following use case :
 - non static indexes with more update (indexing and deleting), than
 select requests (ratio 7:1)
 - homogeneous configuration for all indexes
 - not so much user at the same time

 We started to index mailboxes with Solr 1.4 in 2010, on a subset of
 400,000 users.
 - we had a cluster of 50 servers, 4 Solr per server, 2000 users per Solr
 instance
 - we grow to 6000 users per Solr instance, 8 Solr per server, 60Go per
 index (~2 million users)
 - we upgraded to Solr 3.5 in 2012
 As indexes grew, IOPS and the response times have increased more and more

feedback on Solr 4.x LotsOfCores feature

2013-10-07 Thread Soyez Olivier
Hello,

In my company, we use Solr in production to offer full text search on
mailboxes.
We host dozens million of mailboxes, but only webmail users have such
feature (few millions).
We have the following use case :
- non static indexes with more update (indexing and deleting), than
select requests (ratio 7:1)
- homogeneous configuration for all indexes
- not so much user at the same time

We started to index mailboxes with Solr 1.4 in 2010, on a subset of
400,000 users.
- we had a cluster of 50 servers, 4 Solr per server, 2000 users per Solr
instance
- we grow to 6000 users per Solr instance, 8 Solr per server, 60Go per
index (~2 million users)
- we upgraded to Solr 3.5 in 2012
As indexes grew, IOPS and the response times have increased more and more.

The index size was mainly due to stored fields (large .fdt files)
Retrieving these fields from the index was costly, because of many seek
in large files, and no limit usage possible.
There is also an overhead on queries : too many results are filtered to
find only results concerning user.
For these reason and others, like not pooled users, hardware savings,
better scoring, some requests that do not support filtering, we have
decided to use the LotsOfCores feature.

Our goal was to change the current I/O usage : from lots of random I/O
access on huge segments to mostly sequential I/O access on small segments.
For our use case, it's not a big deal, that the first query to one not
yet loaded core will be slow.
And, we don’t need to fit all the cores into memory at once.

We started from the SOLR-1293 issue and the LotsOfCores wiki page to
finally use a patched Solr 4.2.1 LotsOfCores in production (1 user = 1
core).
We don't need anymore to run so many Solr per node. We are now able to
have around 5 cores per Solr and we plan to grow to 100,000 cores
per instance.
In a first time, we used the solr.xml persistence. All cores have
loadOnStartup=false and transient=true attributes, so a cold start
is very quick. The response times were better than ever, in comparaison
with poor response times, we had before using LotsOfCores.

We added 2 Cores options :
- numBuckets to create a subdirectory based on a hash on the corename
% numBuckets in the core Datadir, because all cores cannot live in the
same directory
- Auto with 3 differents values :
1) false : default behaviour
2) createLoad : create, if not exist, and load the core on the fly on
the first incoming request (update, select).
3) onlyLoad : load the core on the fly on the first incoming request
(update, select), if exist on disk

Then, to improve performance and avoid synchronization in the solr.xml
persistence : we disabled it.
The drawback is we cannot see anymore all the availables cores list with
the admin core status command, only those warmed up.
Finally, we can achieve very good performances with Solr LotsOfCores :
- Index 5 emails (avg) + commit + search : x4.9 faster response time
(Mean), x5.4 faster (95th per)
- Delete 5 documents (avg) : x8.4 faster response time (Mean) x7.4
faster (95th per)
- Search : x3.7 faster response time (Mean) 4x faster (95th per)

In fact, the better performance is mainly due to the little size of each
index, but also thanks to the isolation between cores (updates and
queries on many mailboxes don’t have side effects to each other).
One important thing with the LotsOfCores feature is to take care of :
- the number of file descriptors, it used a lot (need to increase global
max and per process fd)
- the value of the transientCacheSize depending of the RAM size and the
PermGen allocated size
- the leak of ClassLoader that increase minor GC times, when CMS GC is
enabled (use -XX:+CMSClassUnloadingEnabled)
- the overhead to parse solrconfig.xml and load dependencies to open
each core
- lotsOfCores doesn’t work with SolrCloud, then we store indexes
location outside of Solr. We have Solr proxies to route requests to the
right instance.

Not in production, we try the core discovery feature in Solr 4.4 with a
lots of cores.
When you start, it spend a lot of times to discover cores due to a big
number of cores, meanwhile all requests fail (SolrDispatchFilter.init()
not done yet). It will be great to have for example an option for a core
discovery in background, or just to be able to disable it, like we do in
our use case.

If someone is interested in these new options for LotsOfCores feature,
just tell me


Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
exclusif de ses destinataires. Il peut également être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant 
être assurée sur Internet, la responsabilité de Worldline ne pourra être 
recherchée quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'expéditeur ne donne aucune garantie à cet égard et 

Re: Can solr index folder can be moved from one system to another?

2012-03-22 Thread olivier sallou
The index is not directory related, there is no path information in the
index. You can create an index then move it anywhere (or merge it with an
other one).

I often do this, there is no issue.

Olivier

2012/3/22 ravicv ravichandra...@gmail.com

 Hi Tomás,

 I can not use Solr replcation in my scenario. My requirement is to gzip the
 solr index folder and send to dotnet system through webservice.
 Then in dotnet the same index folder should be unzipped and same folder
 should be used as an index folder through solrnet .

 Whether my requirement is possible?

 Thanks
 Ravi



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Can-solr-index-folder-can-be-moved-from-one-system-to-another-tp3844919p3847725.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

gpg key id: 4096R/326D8438  (keyring.debian.org)

Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438


Solr 3.5 MoreLikeThis on Date fields

2012-01-16 Thread Jaco Olivier
Hi Everyone,

Please help out if you know what is going on.
We are upgrading to Solr 3.5 (from 1.4.1) and busy with a Re-Index and Test on 
our data.

Everything seems OK, but Date Fields seem to be broken when using with the 
MoreLikeThis handler 
(I also saw the same error on Date Fields using the HighLighter in another 
forum post Invalid Date String for highlighting any date field match @ Mon 
2011/08/15 13:10 ).
* I deleted the index/core and only loaded a few records and still get the 
error when using the MoreLikeThis using the docdate as part of the mlt.fl 
params.
* I double checked all the data that was loaded and the dates parse 100% and 
can see no problems with any of the data loaded.

Type: fieldType name=date class=solr.TrieDateField 
omitNorms=true precisionStep=0 positionIncrementGap=0/
Definition:   field name=docdate type=date indexed=true stored=true 
multiValued=false/
A sample result: date name=docdate1999-06-28T00:00:00Z/date

THE MLT QUERY:

Jan 16, 2012 4:09:16 PM org.apache.solr.core.SolrCore execute
INFO: [legal_spring] webapp=/solr path=/select 
params={mlt.fl=doctitle,pld_pubtype,docdate,pld_cluster,pld_port,pld_summary,alltext,subclassmlt.mintf=1mlt=trueversion=2.2fl=doc_id,doctitle,docdate,prodtypeqt=mltmlt.boost=truemlt.qf=doctitle^5.0+alltext^0.2json.nl=mapwt=jsonrows=50mlt.mindf=1mlt.count=50start=0q=doc_id:PLD23996}
 status=400 QTime=1

THE ERROR:

Jan 16, 2012 4:09:16 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Invalid Date String:'94046400'
at org.apache.solr.schema.DateField.parseMath(DateField.java:165)
at 
org.apache.solr.analysis.TrieTokenizer.reset(TrieTokenizerFactory.java:106)
at 
org.apache.solr.analysis.TrieTokenizer.init(TrieTokenizerFactory.java:76)
at 
org.apache.solr.analysis.TrieTokenizerFactory.create(TrieTokenizerFactory.java:51)
at 
org.apache.solr.analysis.TrieTokenizerFactory.create(TrieTokenizerFactory.java:41)
at 
org.apache.solr.analysis.TokenizerChain.getStream(TokenizerChain.java:68)
at 
org.apache.solr.analysis.SolrAnalyzer.reusableTokenStream(SolrAnalyzer.java:75)
at 
org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:385)
at 
org.apache.lucene.search.similar.MoreLikeThis.addTermFrequencies(MoreLikeThis.java:876)
at 
org.apache.lucene.search.similar.MoreLikeThis.retrieveTerms(MoreLikeThis.java:820)
at 
org.apache.lucene.search.similar.MoreLikeThis.like(MoreLikeThis.java:629)
at 
org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis(MoreLikeThisHandler.java:311)
at 
org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:149)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:619)

Sincerely,
Jaco Olivier
Please note: This email and its content are subject to the disclaimer as 
displayed at the following link 
http://www.sabinet.co.za/?page=e-mail-disclaimer. Should you not have Web 
access, send an email to i...@sabinet.co.zamailto:i...@sabinet.co.za and a 
copy will be sent to you


Re: solr distributed search don't work

2011-09-01 Thread olivier sallou
   requestHandler name=MYREQUESTHANDLER class=solr.SearchHandler
!-- default values for query parameters --
 lst name=defaults
   str name=echoParamsexplicit/str
   str name=facet.methodenum/str
   str name=facet.mincount1/str
   str name=facet.limit10/str
  str name=shards192.168.1.6/solr/,192.168.1.7/solr//str
 /lst
  /requestHandler

2011/8/19 Li Li fancye...@gmail.com

 could you please show me your configuration in solrconfig.xml?

 On Fri, Aug 19, 2011 at 5:31 PM, olivier sallou
 olivier.sal...@gmail.com wrote:
  Hi,
  I do not use spell but I use distributed search, using qt=spell is
 correct,
  should not use qt=\spell.
  For shards, I specify it in solrconfig directly, not in url, but should
  work the same.
  Maybe an issue in your spell request handler.
 
 
  2011/8/19 Li Li fancye...@gmail.com
 
  hi all,
  I follow the wiki http://wiki.apache.org/solr/SpellCheckComponent
  but there is something wrong.
  the url given my the wiki is
 
 
 http://solr:8983/solr/select?q=*:*spellcheck=truespellcheck.build=truespellcheck.q=toyataqt=spellshards.qt=spellshards=solr-shard1:8983/solr,solr-shard2:8983/solr
  but it does not work. I trace the codes and find that
  qt=spellshards.qt=spell should be qt=/spellshards.qt=/spell
  After modification of url, It return all documents but nothing
  about spell check.
  I debug it and find the
  AbstractLuceneSpellChecker.getSuggestions() is called.
 
 



Re: solr distributed search don't work

2011-08-19 Thread olivier sallou
Hi,
I do not use spell but I use distributed search, using qt=spell is correct,
should not use qt=\spell.
For shards, I specify it in solrconfig directly, not in url, but should
work the same.
Maybe an issue in your spell request handler.


2011/8/19 Li Li fancye...@gmail.com

 hi all,
 I follow the wiki http://wiki.apache.org/solr/SpellCheckComponent
 but there is something wrong.
 the url given my the wiki is

 http://solr:8983/solr/select?q=*:*spellcheck=truespellcheck.build=truespellcheck.q=toyataqt=spellshards.qt=spellshards=solr-shard1:8983/solr,solr-shard2:8983/solr
 but it does not work. I trace the codes and find that
 qt=spellshards.qt=spell should be qt=/spellshards.qt=/spell
 After modification of url, It return all documents but nothing
 about spell check.
 I debug it and find the
 AbstractLuceneSpellChecker.getSuggestions() is called.



lucene 3 and merge/optimize

2011-08-18 Thread olivier sallou
Hi,
after an upgrade to solr/lucene 3, I tried to change the code to remove
deprecated functions  Though new MergePolicy etc... are not really
clear.

I have now issues with the merge and optimize functions.

I have a command line application (Java/Lucene api) that merge multiple
indexes in a single one, or optimize an existing index (this is done
offline)

When I execute my code, the merge creates a new index, but looks to contain
more files than before (with solr 4.1), why not...
When I try to optimize, code says OK, but I still have many files, segments
: (below for a very small example)
_0.fdt  _0.tis  _1.tii  _2.prx  _3.nrm  _4.frq  _5.fnm  _6.fdx  _7.fdt
 _7.tis  _8.tii  _9.prx  _a.nrm  _b.frq
_0.fdx  _1.fdt  _1.tis  _2.tii  _3.prx  _4.nrm  _5.frq  _6.fnm  _7.fdx
 _8.fdt  _8.tis  _9.tii  _a.prx  _b.nrm
_0.fnm  _1.fdx  _2.fdt  _2.tis  _3.tii  _4.prx  _5.nrm  _6.frq  _7.fnm
 _8.fdx  _9.fdt  _9.tis  _a.tii  _b.prx
_0.frq  _1.fnm  _2.fdx  _3.fdt  _3.tis  _4.tii  _5.prx  _6.nrm  _7.frq
 _8.fnm  _9.fdx  _a.fdt  _a.tis  _b.tii
_0.nrm  _1.frq  _2.fnm  _3.fdx  _4.fdt  _4.tis  _5.tii  _6.prx  _7.nrm
 _8.frq  _9.fnm  _a.fdx  _b.fdt  _b.tis
_0.prx  _1.nrm  _2.frq  _3.fnm  _4.fdx  _5.fdt  _5.tis  _6.tii  _7.prx
 _8.nrm  _9.frq  _a.fnm  _b.fdx  segments_1
_0.tii  _1.prx  _2.nrm  _3.frq  _4.fnm  _5.fdx  _6.fdt  _6.tis  _7.tii
 _8.prx  _9.nrm  _a.frq  _b.fnm  segments.gen

I'd like to reduce with the optimize or the merge to the minimum the number
of files, my index is read only and does not change.

Here is the code for optimize, am I doing something wrong?

 IndexWriterConfig conf = new
IndexWriterConfig(Version.LUCENE_33,newStandardAnalyzer(Version.
LUCENE_33));

 conf.setRAMBufferSizeMB(50);

 LogByteSizeMergePolicy policy = new LogByteSizeMergePolicy();

 policy.setMaxMergeDocs(10);

 conf.setMergePolicy(policy);

 IndexWriter writer =
newIndexWriter(FSDirectory.open(INDEX_DIR),getIndexConfig() );


  writer.optimize();

 writer.close();



Thanks


Olivier


Re: lucene 3 and merge/optimize

2011-08-18 Thread olivier sallou
answer to myself, to be checked...

I used policy.setMaxMergeDocs(10),  limiting to small number of filesat
least for merge.
I gonna test.

2011/8/18 olivier sallou olivier.sal...@gmail.com

 Hi,
 after an upgrade to solr/lucene 3, I tried to change the code to remove
 deprecated functions  Though new MergePolicy etc... are not really
 clear.

 I have now issues with the merge and optimize functions.

 I have a command line application (Java/Lucene api) that merge multiple
 indexes in a single one, or optimize an existing index (this is done
 offline)

 When I execute my code, the merge creates a new index, but looks to contain
 more files than before (with solr 4.1), why not...
 When I try to optimize, code says OK, but I still have many files, segments
 : (below for a very small example)
 _0.fdt  _0.tis  _1.tii  _2.prx  _3.nrm  _4.frq  _5.fnm  _6.fdx  _7.fdt
  _7.tis  _8.tii  _9.prx  _a.nrm  _b.frq
 _0.fdx  _1.fdt  _1.tis  _2.tii  _3.prx  _4.nrm  _5.frq  _6.fnm  _7.fdx
  _8.fdt  _8.tis  _9.tii  _a.prx  _b.nrm
 _0.fnm  _1.fdx  _2.fdt  _2.tis  _3.tii  _4.prx  _5.nrm  _6.frq  _7.fnm
  _8.fdx  _9.fdt  _9.tis  _a.tii  _b.prx
 _0.frq  _1.fnm  _2.fdx  _3.fdt  _3.tis  _4.tii  _5.prx  _6.nrm  _7.frq
  _8.fnm  _9.fdx  _a.fdt  _a.tis  _b.tii
 _0.nrm  _1.frq  _2.fnm  _3.fdx  _4.fdt  _4.tis  _5.tii  _6.prx  _7.nrm
  _8.frq  _9.fnm  _a.fdx  _b.fdt  _b.tis
 _0.prx  _1.nrm  _2.frq  _3.fnm  _4.fdx  _5.fdt  _5.tis  _6.tii  _7.prx
  _8.nrm  _9.frq  _a.fnm  _b.fdx  segments_1
 _0.tii  _1.prx  _2.nrm  _3.frq  _4.fnm  _5.fdx  _6.fdt  _6.tis  _7.tii
  _8.prx  _9.nrm  _a.frq  _b.fnm  segments.gen

 I'd like to reduce with the optimize or the merge to the minimum the number
 of files, my index is read only and does not change.

 Here is the code for optimize, am I doing something wrong?

  IndexWriterConfig conf = new 
 IndexWriterConfig(Version.LUCENE_33,newStandardAnalyzer(Version.
 LUCENE_33));

  conf.setRAMBufferSizeMB(50);

  LogByteSizeMergePolicy policy = new LogByteSizeMergePolicy();

  policy.setMaxMergeDocs(10);

  conf.setMergePolicy(policy);

  IndexWriter writer = 
 newIndexWriter(FSDirectory.open(INDEX_DIR),getIndexConfig() );


   writer.optimize();

  writer.close();



 Thanks


 Olivier



SOlr upgrade: Invalid version (expected 2, but 1) error when using shards

2011-08-16 Thread olivier sallou
Hi,
I just migrated to solr 3.3 from 1.4.1.
My index is still in 1.4.1 format (will be migrated soon).

I have an error when I use sharding with the new version:

org.apache.solr.common.SolrException: java.lang.RuntimeException: Invalid
version (expected 2, but 1) or the data in not in 'javabin' format

However, if I request each shard independently (/request), answer is
correct. So the error is triggered only with the shard mechanism.

While I foresee to upgrade my indexes, I'd like to understand the issue,
e.g. is it an upgrade issue or don't shards support using an old format.

Thanks

Olivier


Re: how to request for Json object

2011-06-02 Thread olivier sallou
ajax does not allow request to an other domain.
Only sway, unless using server side requests, is going through a proxy that
would hide the host origin so that ajax request think both servers are the
same

2011/6/2 Romi romijain3...@gmail.com

 How to parse Json through ajax when your ajax pager is on one
 server(Tomcat)and Json object is of onther server(solr server). i mean i
 have to make a request to another server, how can i do it .

 -
 Thanks  Regards
 Romi
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014138.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Solr Cell and operations on metadata extracted

2011-05-16 Thread Olivier Tavard
Hi,



I have a question about Solr Cell please.

I index some files. For example, if I want to extract the filename, then use
a hash function on it like MD5 and then store it on Solr ; the correct way
is to use Tika « manually » to extract the metadata I want, do the
transformations on it and then send it to Solr ?

I can’t use directly Solr Cell in this case because I can't do modifications
on the metadata extracted, right ?





Thanks,



Olivier


Replication and CPU

2010-10-12 Thread Olivier RICARD

Hello,

I setup a server for the replication of Solr. I used 2 cores and for 
each one I specified the replication. I followed the tutorial on 
http://wiki.apache.org/solr/SolrReplication.


The replication is OK for each cores. However the CPU is used to 100% on 
the slave. The master and slave are 2 servers with the same hardware 
configuration. I don't understand which can cause the problem. The slave 
is launched by :



java -Dsolr.solr.home=/solr/multicore -Denable.master=false 
-Denable.slave=true -Xms512m -Xmx1536m -XX:+UseConcMarkSweepGC -jar 
start.jar


If I comment the replication the server is OK.

Anyone have an idea ?

Regards,
Olivier


Re: Replication and CPU

2010-10-12 Thread Olivier RICARD

Hello Peter,

On the slave server http://slave/solr/core0/admin/replication/index.jsp

Poll Interval00:30:00
Local Index Index Version: 1284026488242, Generation: 13102
Location: /solr/multicore/core0/data/index
Size: 26.9 GB
Times Replicated Since Startup: 289
Previous Replication Done At: Tue Oct 12 12:00:00 GMT+02:00 2010
Config Files Replicated At: 1286790818824
Config Files Replicated: [solrconfig_slave.xml]
Times Config Files Replicated Since Startup: 1
Next Replication Cycle At: Tue Oct 12 12:30:00 GMT+02:00 2010

The request Handler on the slave  :
requestHandler name=/replication class=solr.ReplicationHandler 
lst name=slave
str 
name=masterUrlhttp://master/solr/${solr.core.name}/replication/str

str name=pollInterval00:30:00/str
/lst
/requestHandler

I increased the poll interval because I thought that there were too many 
changes. Currently there are no changes on the master and the slave is 
always to 100% of cpu.



On the master, I have

requestHandler name=/replication class=solr.ReplicationHandler 
lst name=master
str name=replicateAfterstartup/str
str name=replicateAftercommit/str
str 
name=confFilessolrconfig_slave.xml:solrconfig.xml,schema.xml,stopwords.txt,elevate.xml,protwords.txt,spellings.txt,synonyms.txt/str

str name=commitReserveDuration00:00:10/str
/lst
/requestHandler

Regards,
Olivier


Le 12/10/2010 12:11, Peter Karich a écrit :

Hi Olivier,

maybe the slave replicates after startup? check replication status here:
http://localhost/solr/admin/replication/index.jsp

what is your poll frequency (could you paste the replication part)?

Regards,
Peter.


Hello,

I setup a server for the replication of Solr. I used 2 cores and for
each one I specified the replication. I followed the tutorial on
http://wiki.apache.org/solr/SolrReplication.

The replication is OK for each cores. However the CPU is used to 100%
on the slave. The master and slave are 2 servers with the same
hardware configuration. I don't understand which can cause the
problem. The slave is launched by :


java -Dsolr.solr.home=/solr/multicore -Denable.master=false
-Denable.slave=true -Xms512m -Xmx1536m -XX:+UseConcMarkSweepGC -jar
start.jar

If I comment the replication the server is OK.

Anyone have an idea ?

Regards,
Olivier








Solr and Lucene in South Africa

2010-07-30 Thread Jaco Olivier
Hi to all Solr/Lucene Users...

Out team had a discussion today regarding the Solr/Lucene community closer to 
home.
I am hereby putting out an SOS to all Solr/Lucene users in the South African 
market and wish to organize a meet-up (or user support group) if at all 
possible.
It would be great to share some triumphs and pitfalls that were experienced.

* Sorry for hogging the User Mailing list on non-technical question, but think 
this is the easiest way to get it done :)

Jaco Olivier
Web Specialist

Please note: This email and its content are subject to the disclaimer as 
displayed at the following link 
http://www.sabinet.co.za/?page=e-mail-disclaimer. Should you not have Web 
access, send an email to i...@sabinet.co.zamailto:i...@sabinet.co.za and a 
copy will be sent to you


Re: Spatial filtering

2010-07-20 Thread Olivier Ricordeau



Le 20/07/2010 04:18, Lance Norskog a écrit :

Add the debugQuery=true parameter and it will show you the Lucene
query tree, and how each document is evaluated. This can help with the
more complex queries.


Do you see something wrong?

 [debug] = Array
(
[rawquerystring] = *:*
[querystring] = *:*
[parsedquery] = MatchAllDocsQuery(*:*)
[parsedquery_toString] = *:*
[explain] = Array
(
[doc_45269] =
1.0 = (MATCH) MatchAllDocsQuery, product of:
  1.0 = queryNorm

[doc_50206] =
1.0 = (MATCH) MatchAllDocsQuery, product of:
  1.0 = queryNorm

[doc_50396] =
1.0 = (MATCH) MatchAllDocsQuery, product of:
  1.0 = queryNorm

[doc_51199] =
1.0 = (MATCH) MatchAllDocsQuery, product of:
  1.0 = queryNorm

[]

)

[QParser] = LuceneQParser
[filter_queries] = Array
(
[0] = +object_type:Concert 
+date:[2010-07-20T00:00:00Z TO 2011-07-20T23:59:59Z] +{!sfilt 
fl=coords_lat_lon,units=km,meas=hsin}

)

[parsed_filter_queries] = Array
(
[0] = +object_type:Concert +date:[127958400 TO 
1311206399000] +name:{!sfilt TO fl=coords_lat_lon,units=km,meas=hsin}

)

[...]

I'm not sure about the parsed_filter_queries entry. It looks like the 
+{!sfilt fl=coords_lat_lon,units=km,meas=hsin} is not well interpreted 
(seems like it's interpreted as a range). Does anyone know what the 
right syntax? This is not documented...


Cheers,
Olivier



On Mon, Jul 19, 2010 at 3:35 AM, Olivier Ricordeau
oliv...@ricordeau.org  wrote:

Hi folks,

I can't manage to have the new spatial filtering feature (added in r962727
by Grant Ingersoll, see https://issues.apache.org/jira/browse/SOLR-1568)
working. I'm trying to get all the documents located within a circle defined
by its center and radius.
I've modified my query url as specified in
http://wiki.apache.org/solr/SpatialSearch#Spatial_Filter_QParser to add the
pt, d and meas parameters. Here is what my query parameters looks like
(from Solr's response with debug mode activated):

[params] =  Array
(
[explainOther] =  true
[mm] =  2-75%
[d] =  50
[sort] =  date asc
[qf] =
[wt] =  php
[rows] =  5000
[version] =  2.2
[fl] =  object_type object_id score
[debugQuery] =  true
[start] =  0
[q] =  *:*
[meas] =  hsin
[pt] =  48.85341,2.3488
[bf] =
[qt] =  standard
[fq] =  +object_type:Concert +date:[2010-07-19T00:00:00Z
TO 2011-07-19T23:59:59Z]
)



With this query, I get 3859 results. And some (lots) of the found documents
are not located whithin the circle! :(
If I run the same query without spatial filtering (if I remove the pt, d
and meas parameters from the url), I get 3859 results too. So it looks
like my spatial filtering constraint is not taken into account in the first
search query (the one where pt, d and meas are set). Is the wiki's doc
up to date?

In the comments of SOLR-1568, I've seen someone talking about adding
{!sfilt fl=latlon_field_name}. So I tried the following request:

[params] =  Array
(
[explainOther] =  true
[mm] =  2-75%
[d] =  50
[sort] =  date asc
[qf] =
[wt] =  php
[rows] =  5000
[version] =  2.2
[fl] =  object_type object_id score
[debugQuery] =  true
[start] =  0
[q] =  *:*
[meas] =  hsin
[pt] =  48.85341,2.3488
[bf] =
[qt] =  standard
[fq] =  +object_type:Concert +date:[2010-07-19T00:00:00Z
TO 2011-07-19T23:59:59Z] +{!sfilt fl=coords_lat_lon,units=km,meas=hsin}
)

This leads to 2713 results (which is smaller than 3859, good). But some
(lots) of the results are once more out of the circle :(

Can someone help me get spatial filtering working? I really don't understand
the search results I'm getting.

Cheers,
Olivier

--
- *Olivier RICORDEAU* -
  oliv...@ricordeau.org
http://olivier.ricordeau.org








--
- *Olivier RICORDEAU* -
 oliv...@ricordeau.org
http://olivier.ricordeau.org



Re: Spatial filtering

2010-07-20 Thread Olivier Ricordeau
Ok, I have found a big bug in my indexing script. Things are getting 
better. I managed to have my parsed_filter_query to:
+coords_lat_lon_0_latLon:[48.694179707855874 TO 49.01213545059667] 
+coords_lat_lon_1_latLon:[2.1079512793239767 TO 2.5911832073858765]


For the record, here are the parameters which made it work:
[params] = Array
(
[explainOther] = true
[mm] = 2-75%
[d] = 25
[sort] = date asc
[qf] =
[wt] = php
[rows] = 5000
[version] = 2.2
[fl] = * score
[debugQuery] = true
[start] = 0
[q] = *:*
[meas] = hsin
[pt] = 48.85341,2.3488
[bf] =
[qt] = standard
[fq] = {!sfilt fl=coords_lat_lon} 
+object_type:Concert +date:[2008-07-20T00:00:00Z TO 2011-07-20T23:59:59Z]

)
But I am facing one problem: the  +object_type:Concert + 
date:[2008-07-20T00:00:00Z TO 2011-07-20T23:59:59Z] part of my fq 
parameter is not taken into account (see the parsed_filter_query above).

So here is my question:
How can I mix the {!sfilt fl=coords_lat_lon} part of the fq parameter 
with usual fq parameters (eg: +object_type:Concert)?


Can anyone help?

Regards,
Olivier


Le 20/07/2010 09:53, Olivier Ricordeau a écrit :



Le 20/07/2010 04:18, Lance Norskog a écrit :

Add the debugQuery=true parameter and it will show you the Lucene
query tree, and how each document is evaluated. This can help with the
more complex queries.


Do you see something wrong?

[debug] = Array
(
[rawquerystring] = *:*
[querystring] = *:*
[parsedquery] = MatchAllDocsQuery(*:*)
[parsedquery_toString] = *:*
[explain] = Array
(
[doc_45269] =
1.0 = (MATCH) MatchAllDocsQuery, product of:
1.0 = queryNorm

[doc_50206] =
1.0 = (MATCH) MatchAllDocsQuery, product of:
1.0 = queryNorm

[doc_50396] =
1.0 = (MATCH) MatchAllDocsQuery, product of:
1.0 = queryNorm

[doc_51199] =
1.0 = (MATCH) MatchAllDocsQuery, product of:
1.0 = queryNorm

[]

)

[QParser] = LuceneQParser
[filter_queries] = Array
(
[0] = +object_type:Concert +date:[2010-07-20T00:00:00Z TO
2011-07-20T23:59:59Z] +{!sfilt fl=coords_lat_lon,units=km,meas=hsin}
)

[parsed_filter_queries] = Array
(
[0] = +object_type:Concert +date:[127958400 TO 1311206399000]
+name:{!sfilt TO fl=coords_lat_lon,units=km,meas=hsin}
)

[...]

I'm not sure about the parsed_filter_queries entry. It looks like the
+{!sfilt fl=coords_lat_lon,units=km,meas=hsin} is not well interpreted
(seems like it's interpreted as a range). Does anyone know what the
right syntax? This is not documented...

Cheers,
Olivier



On Mon, Jul 19, 2010 at 3:35 AM, Olivier Ricordeau
oliv...@ricordeau.org wrote:

Hi folks,

I can't manage to have the new spatial filtering feature (added in
r962727
by Grant Ingersoll, see https://issues.apache.org/jira/browse/SOLR-1568)
working. I'm trying to get all the documents located within a circle
defined
by its center and radius.
I've modified my query url as specified in
http://wiki.apache.org/solr/SpatialSearch#Spatial_Filter_QParser to
add the
pt, d and meas parameters. Here is what my query parameters
looks like
(from Solr's response with debug mode activated):

[params] = Array
(
[explainOther] = true
[mm] = 2-75%
[d] = 50
[sort] = date asc
[qf] =
[wt] = php
[rows] = 5000
[version] = 2.2
[fl] = object_type object_id score
[debugQuery] = true
[start] = 0
[q] = *:*
[meas] = hsin
[pt] = 48.85341,2.3488
[bf] =
[qt] = standard
[fq] = +object_type:Concert +date:[2010-07-19T00:00:00Z
TO 2011-07-19T23:59:59Z]
)



With this query, I get 3859 results. And some (lots) of the found
documents
are not located whithin the circle! :(
If I run the same query without spatial filtering (if I remove the
pt, d
and meas parameters from the url), I get 3859 results too. So it looks
like my spatial filtering constraint is not taken into account in the
first
search query (the one where pt, d and meas are set). Is the
wiki's doc
up to date?

In the comments of SOLR-1568, I've seen someone talking about adding
{!sfilt fl=latlon_field_name}. So I tried the following request:

[params] = Array
(
[explainOther] = true
[mm] = 2-75%
[d] = 50
[sort] = date asc
[qf] =
[wt] = php
[rows] = 5000
[version] = 2.2
[fl] = object_type object_id score
[debugQuery] = true
[start] = 0
[q] = *:*
[meas] = hsin
[pt] = 48.85341,2.3488
[bf] =
[qt] = standard
[fq] = +object_type:Concert +date:[2010-07-19T00:00:00Z
TO 2011-07-19T23:59:59Z] +{!sfilt fl=coords_lat_lon,units=km,meas=hsin}
)

This leads to 2713 results (which is smaller than 3859, good). But some
(lots) of the results are once more out of the circle :(

Can someone help me get spatial filtering working? I really don't
understand
the search results I'm getting.

Cheers,
Olivier

--
- *Olivier RICORDEAU* -
oliv...@ricordeau.org

Re: dismax request handler without q

2010-07-20 Thread olivier sallou
q will search in defaultSearchField if no field name is set, but you can
specify in your q param the fields you want to search into.

Dismax is a handler where you can specify to look in a number of fields for
the input query. In this case, you do not specify the fields and dismax will
look in the fields specified in its configuration.
However, by default, dismax is not used, it needs to be called help with the
query type parameter (qt=dismax).

In default solr config, you can call ...solr/select?q=keyphrase:hotel if
keyphrzase is a declared field in your schema

2010/7/20 Chamnap Chhorn chamnapchh...@gmail.com

 I can't put q=keyphrase:hotel in my request using dismax handler. It
 returns
 no result.

 On Tue, Jul 20, 2010 at 1:19 PM, Chamnap Chhorn chamnapchh...@gmail.com
 wrote:

  There are some default configuration on my solrconfig.xml that I didn't
  show you. I'm a little confused when reading
  http://wiki.apache.org/solr/DisMaxRequestHandler#q. I think q is for
 plain
  user input query.
 
 
  On Tue, Jul 20, 2010 at 12:08 PM, olivier sallou 
 olivier.sal...@gmail.com
   wrote:
 
  Hi,
  this is not very clear, if you need to query only keyphrase, why don't
 you
  query directly it? e.g. q=keyphrase:hotel ?
  Furthermore, why dismax if only keyphrase field is of interest? dismax
 is
  used to query multiple fields automatically.
 
  At least dismax do not appear in your query (using query type). It is
 set
  in
  your config for your default request handler?
 
  2010/7/20 Chamnap Chhorn chamnapchh...@gmail.com
 
   I wonder how could i make a query to return only *all books* that has
   keyphrase web development using dismax handler? A book has multiple
   keyphrases (keyphrase is multivalued column). Do I have to pass q
   parameter?
  
  
   Is it the correct one?
   http://locahost:8081/solr/select?q=hotelfq=keyphrase:%20hotel
  
   --
   Chhorn Chamnap
   http://chamnapchhorn.blogspot.com/
  
 
 
 
 
  --
  Chhorn Chamnap
  http://chamnapchhorn.blogspot.com/
 



 --
 Chhorn Chamnap
 http://chamnapchhorn.blogspot.com/



Spatial filtering

2010-07-19 Thread Olivier Ricordeau

Hi folks,

I can't manage to have the new spatial filtering feature (added in 
r962727 by Grant Ingersoll, see 
https://issues.apache.org/jira/browse/SOLR-1568) working. I'm trying to 
get all the documents located within a circle defined by its center and 
radius.
I've modified my query url as specified in 
http://wiki.apache.org/solr/SpatialSearch#Spatial_Filter_QParser to add 
the pt, d and meas parameters. Here is what my query parameters 
looks like (from Solr's response with debug mode activated):


[params] = Array
(
[explainOther] = true
[mm] = 2-75%
[d] = 50
[sort] = date asc
[qf] =
[wt] = php
[rows] = 5000
[version] = 2.2
[fl] = object_type object_id score
[debugQuery] = true
[start] = 0
[q] = *:*
[meas] = hsin
[pt] = 48.85341,2.3488
[bf] =
[qt] = standard
[fq] = +object_type:Concert 
+date:[2010-07-19T00:00:00Z TO 2011-07-19T23:59:59Z]

)



With this query, I get 3859 results. And some (lots) of the found 
documents are not located whithin the circle! :(
If I run the same query without spatial filtering (if I remove the pt, 
d and meas parameters from the url), I get 3859 results too. So it 
looks like my spatial filtering constraint is not taken into account in 
the first search query (the one where pt, d and meas are set). Is 
the wiki's doc up to date?


In the comments of SOLR-1568, I've seen someone talking about adding 
{!sfilt fl=latlon_field_name}. So I tried the following request:


[params] = Array
(
[explainOther] = true
[mm] = 2-75%
[d] = 50
[sort] = date asc
[qf] =
[wt] = php
[rows] = 5000
[version] = 2.2
[fl] = object_type object_id score
[debugQuery] = true
[start] = 0
[q] = *:*
[meas] = hsin
[pt] = 48.85341,2.3488
[bf] =
[qt] = standard
[fq] = +object_type:Concert 
+date:[2010-07-19T00:00:00Z TO 2011-07-19T23:59:59Z] +{!sfilt 
fl=coords_lat_lon,units=km,meas=hsin}

)

This leads to 2713 results (which is smaller than 3859, good). But some 
(lots) of the results are once more out of the circle :(


Can someone help me get spatial filtering working? I really don't 
understand the search results I'm getting.


Cheers,
Olivier

--
- *Olivier RICORDEAU* -
 oliv...@ricordeau.org
http://olivier.ricordeau.org



How to get the list of all available fields in a (sharded) index

2010-07-19 Thread olivier sallou
Hi,
I cannot find any info on how to get the list of current fields in an index
(possibly sharded). With dynamic fields, I cannot simply parse the schema to
know what field are available.
Is there any way to get it via a request (or easilly programmable) ? I know
information is available in one of the Lucene generated files, but I 'd like
to get it via a query for my whole index.

Thanks

Olivier


Re: dismax request handler without q

2010-07-19 Thread olivier sallou
Hi,
this is not very clear, if you need to query only keyphrase, why don't you
query directly it? e.g. q=keyphrase:hotel ?
Furthermore, why dismax if only keyphrase field is of interest? dismax is
used to query multiple fields automatically.

At least dismax do not appear in your query (using query type). It is set in
your config for your default request handler?

2010/7/20 Chamnap Chhorn chamnapchh...@gmail.com

 I wonder how could i make a query to return only *all books* that has
 keyphrase web development using dismax handler? A book has multiple
 keyphrases (keyphrase is multivalued column). Do I have to pass q
 parameter?


 Is it the correct one?
 http://locahost:8081/solr/select?q=hotelfq=keyphrase:%20hotel

 --
 Chhorn Chamnap
 http://chamnapchhorn.blogspot.com/



Re: Tag generation

2010-07-15 Thread Olivier Dobberkau

Am 15.07.2010 um 17:34 schrieb kenf_nc:

 A colleague mentioned that he knew of services where you pass some content
 and it spits out some suggested Tags or Keywords that would be best suited
 to associate with that content.
 
 Does anyone know if there is a contrib to Solr or Lucene that does something
 like this? Or a third party tool that can be given a solr index or solr
 query and it comes up with some good Tag suggestions?

Hi

there something from http://www.zemanta.com/
and something from basis tech http://www.basistech.com/

i am not sure if this would help. you could have a look at

http://uima.apache.org/

greetings,

olivier

--

Olivier Dobberkau



Faceted search outofmemory

2010-06-29 Thread olivier sallou
Hi,
I try to make a faceted search on a very large index (around 200GB with 200M
doc).
I have an out of memory error. With no facet it works fine.

There are quite many questions around this but I could not find the answer.
How can we know the required memory when facets are used so that I try to
scale my server/index correctly to handle it.

Thanks

Olivier


Re: Faceted search outofmemory

2010-06-29 Thread olivier sallou
How do make paging over facets?

2010/6/29 Ankit Bhatnagar abhatna...@vantage.com


 Did you trying paging them?


 -Original Message-
 From: olivier sallou [mailto:olivier.sal...@gmail.com]
 Sent: Tuesday, June 29, 2010 2:04 PM
 To: solr-user@lucene.apache.org
 Subject: Faceted search outofmemory

 Hi,
 I try to make a faceted search on a very large index (around 200GB with
 200M
 doc).
 I have an out of memory error. With no facet it works fine.

 There are quite many questions around this but I could not find the answer.
 How can we know the required memory when facets are used so that I try to
 scale my server/index correctly to handle it.

 Thanks

 Olivier



Re: Faceted search outofmemory

2010-06-29 Thread olivier sallou
I have given 6G to Tomcat. Using facet.method=enum and facet.limit seems to
fix the issue with a few tests, but I do know that it is not a final
solution. Will work under certain configurations.

Real issue is to be able to know what is the required RAM for an index...

2010/6/29 Nagelberg, Kallin knagelb...@globeandmail.com

 How much memory have you given the solr jvm? Many servlet containers have
 small amount by default.

 -Kal

 -Original Message-
 From: olivier sallou [mailto:olivier.sal...@gmail.com]
 Sent: Tuesday, June 29, 2010 2:04 PM
 To: solr-user@lucene.apache.org
 Subject: Faceted search outofmemory

 Hi,
 I try to make a faceted search on a very large index (around 200GB with
 200M
 doc).
 I have an out of memory error. With no facet it works fine.

 There are quite many questions around this but I could not find the answer.
 How can we know the required memory when facets are used so that I try to
 scale my server/index correctly to handle it.

 Thanks

 Olivier



Re: Need help on Solr Cell usage with specific Tika parser

2010-06-15 Thread olivier sallou
Thanks,
moving it to direcxt child worked.

Olivier

2010/6/14 Chris Hostetter hossman_luc...@fucit.org


 : In solrconfig, in update/extract requesthandler I specified str
 : name=tika.config./tika-config.xml/str , where tika-config.xml is in
 : conf directory (same as solrconfig).

 can you show us the full requestHandler decalration? ... tika.config needs
 to be a direct child of the requestHandler (not in the defaults)

 I also don't know if using a local path like that will work -- depends
 on how that file is loaded (if solr loads it, then you might want to
 remove the ./;  if solr just gives the path to tika, then you probably
 need an absolute path.


 -Hoss




Need help on Solr Cell usage with specific Tika parser

2010-06-14 Thread olivier sallou
Hi,
I use Solr Cell to send specific content files. I developped a dedicated
Parser for specific mime types.
However I cannot get Solr accepting my new mime types.

In solrconfig, in update/extract requesthandler I specified str
name=tika.config./tika-config.xml/str , where tika-config.xml is in
conf directory (same as solrconfig).

In tika-config I added my mimetypes:

parser name=parse-readseq
class=org.irisa.genouest.tools.readseq.ReadSeqParser
mimebiosequence/document/mime
mimebiosequence/embl/mime
mimebiosequence/genbank/mime
/parser

I do not know for:
  mimeTypeRepository resource=./tika-mimetypes.xml magic=false/

whereas path to tika mimetypes should be absolute or relative... and even if
this file needs to be redefined if magic is not used.


When I run my update/extract, I have an error that biosequence/document
does not match any known parser.

Thanks

Olivier


Re: Need help on Solr Cell usage with specific Tika parser

2010-06-14 Thread olivier sallou
Yeap, I do.
As magic is not set, this is the reason why it looks for this specific
mime-type. Unfortunatly, It seems it either do not read my specific
tika-config file or the mime-type file. But there is no error log concerning
those files... (not trying to load them?)


2010/6/14 Ken Krugler kkrugler_li...@transpac.com

 Hi Olivier,

 Are you setting the mime type explicitly via the stream.type parameter?

 -- Ken


 On Jun 14, 2010, at 9:14am, olivier sallou wrote:

  Hi,
 I use Solr Cell to send specific content files. I developped a dedicated
 Parser for specific mime types.
 However I cannot get Solr accepting my new mime types.

 In solrconfig, in update/extract requesthandler I specified str
 name=tika.config./tika-config.xml/str , where tika-config.xml is in
 conf directory (same as solrconfig).

 In tika-config I added my mimetypes:

 parser name=parse-readseq
 class=org.irisa.genouest.tools.readseq.ReadSeqParser
   mimebiosequence/document/mime
   mimebiosequence/embl/mime
   mimebiosequence/genbank/mime
   /parser

 I do not know for:
  mimeTypeRepository resource=./tika-mimetypes.xml magic=false/

 whereas path to tika mimetypes should be absolute or relative... and even
 if
 this file needs to be redefined if magic is not used.


 When I run my update/extract, I have an error that biosequence/document
 does not match any known parser.

 Thanks

 Olivier


 
 Ken Krugler
 +1 530-210-6378
 http://bixolabs.com
 e l a s t i c   w e b   m i n i n g







Re: solr itas

2010-06-11 Thread olivier sallou
did you update solrconfig.xml to add /itas query handler?

2010/6/11 s...@icarinae.com

 Hi,

 When I type http://127.0.0.1:8080/solr/itas

 I receive this result in the webpage instead of html page. Does anyone
 know the reason and/or suggestion to fix it.

 ?xml version=1.0 encoding=UTF-8 ?
 - response
 - lst name=responseHeader
  int name=status0/int
  int name=QTime62/int
  /lst
 - result name=response numFound=3 start=0 maxScore=1.0
 - doc
  float name=score1.0/float
 - arr name=company
  strLucid Imagination/str
  /arr
 - arr name=country
  strUSA/str
  /arr
 - arr name=first




 Thanks,





Re: newbie question on how to batch commit documents

2010-06-01 Thread olivier sallou
I would additionally suggest to use embeddedSolrServer for large uploads if
possible, performance are better.

2010/5/31 Steve Kuo kuosen...@gmail.com

 I have a newbie question on what is the best way to batch add/commit a
 large
 collection of document data via solrj.  My first attempt  was to write a
 multi-threaded application that did following.

 CollectionSolrInputDocument docs = new ArrayListSolrInputDocument();
 for (Widget w : widges) {
doc.addField(id, w.getId());
doc.addField(name, w.getName());
   doc.addField(price, w.getPrice());
doc.addField(category, w.getCat());
doc.addField(srcType, w.getSrcType());
docs.add(doc);

// commit docs to solr server
server.add(docs);
server.commit();
 }

 And I got this exception.

 rg.apache.solr.common.SolrException:

 Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers2_try_again_later


 Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers2_try_again_later

at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424)
at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
at
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at
 org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:86)

 The solrj wiki/documents seemed to indicate that because multiple threads
 were calling SolrServer.commit() which in term called
 CommonsHttpSolrServer.request() resulting in multiple searchers.  My first
 thought was to change the configs for autowarming.  But after looking at
 the
 autowarm params, I am not sure what can be changed or perhaps a different
 approach is recommened.

filterCache
  class=solr.FastLRUCache
  size=512
  initialSize=512
  autowarmCount=0/

queryResultCache
  class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=0/

documentCache
  class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=0/

 Your help is much appreciated.



Solr 1.4 query fails against all fields, but succeed if field is specified.

2010-05-31 Thread olivier sallou
Hi,
I have created in index with several fields.
If I query my index in the admin section of solr (or via http request), I
get results for my search if I specify the requested field:
Query:   note:Aspergillus  (look for Aspergillus in field note)
However, if I query the same word against all fields  (Aspergillus or
all:Aspergillus) , I have no match in response from Solr.

Do you have any idea of what can be wrong with my index?

Regards

Olivier


Re: Solr 1.4 query fails against all fields, but succeed if field is specified.

2010-05-31 Thread olivier sallou
Ok,
I use default e.g. standard request handler.
Using *:Aspergillus does not work either.

I can try with DisMax but this means that I know all field names. My schema
knows a number of them, but some other fields are defined via dynamic fields
(I know the type, but I do not know their names).
Is there any way to query all fields including dynamic ones?

thanks

Olivier

2010/5/31 Michael Kuhlmann michael.kuhlm...@zalando.de

 Am 31.05.2010 11:50, schrieb olivier sallou:
  Hi,
  I have created in index with several fields.
  If I query my index in the admin section of solr (or via http request), I
  get results for my search if I specify the requested field:
  Query:   note:Aspergillus  (look for Aspergillus in field note)
  However, if I query the same word against all fields  (Aspergillus or
  all:Aspergillus) , I have no match in response from Solr.

 Querying Aspergillus without a field does only work if you're using
 DisMaxHandler.

 Do you have a field all?

 Try *:Aspergillus instead.



Re: Solr 1.4 query fails against all fields, but succeed if field is specified.

2010-05-31 Thread olivier sallou
I finally got a solution. As I use dynamic fields. I use the copyField to a
global indexed attribute, and specify this attribute as defaultSearchField
in my schema.

The *:term with standard query type fails without this...

This solution requires to double the required indexing data but works in all
cases...

In my schema I have:
field name=note type=text indexed=true stored=false/
Some other fields are lowercase or int types.

Regards

2010/5/31 Michael Kuhlmann michael.kuhlm...@zalando.de

 Am 31.05.2010 12:36, schrieb olivier sallou:
  Is there any way to query all fields including dynamic ones?

 Yes, using the *:term query. (Please note that the asterisk should not
 be quoted.)

 To answer your question, we need more details on your Solr
 configuration, esp. the part of schema.xml that defines your note field.

 Greetings,
 Michael





Re: ubuntu lucid package

2010-04-30 Thread Olivier Dobberkau

Am 30.04.2010 um 09:24 schrieb Gora Mohanty:

 Also, the standard Debian/Ubuntu way of finding out what files a
 package installed is:
  dpkg -l pkg_name
 
 Regards,
 Gora

You might try:

# dpkg -L solr-common
/.
/etc
/etc/solr
/etc/solr/web.xml
/etc/solr/conf
/etc/solr/conf/admin-extra.html
/etc/solr/conf/elevate.xml
/etc/solr/conf/mapping-ISOLatin1Accent.txt
/etc/solr/conf/protwords.txt
/etc/solr/conf/schema.xml
/etc/solr/conf/scripts.conf
/etc/solr/conf/solrconfig.xml
/etc/solr/conf/spellings.txt
/etc/solr/conf/stopwords.txt
/etc/solr/conf/synonyms.txt
/etc/solr/conf/xslt
/etc/solr/conf/xslt/example.xsl
/etc/solr/conf/xslt/example_atom.xsl
/etc/solr/conf/xslt/example_rss.xsl
/etc/solr/conf/xslt/luke.xsl
/usr
/usr/share
/usr/share/solr
/usr/share/solr/WEB-INF
/usr/share/solr/WEB-INF/lib
/usr/share/solr/WEB-INF/lib/apache-solr-core-1.4.0.jar
/usr/share/solr/WEB-INF/lib/apache-solr-dataimporthandler-1.4.0.jar
/usr/share/solr/WEB-INF/lib/apache-solr-solrj-1.4.0.jar
/usr/share/solr/WEB-INF/weblogic.xml
/usr/share/solr/scripts
/usr/share/solr/scripts/abc
/usr/share/solr/scripts/abo
/usr/share/solr/scripts/backup
/usr/share/solr/scripts/backupcleaner
/usr/share/solr/scripts/commit
/usr/share/solr/scripts/optimize
/usr/share/solr/scripts/readercycle
/usr/share/solr/scripts/rsyncd-disable
/usr/share/solr/scripts/rsyncd-enable
/usr/share/solr/scripts/rsyncd-start
/usr/share/solr/scripts/rsyncd-stop
/usr/share/solr/scripts/scripts-util
/usr/share/solr/scripts/snapcleaner
/usr/share/solr/scripts/snapinstaller
/usr/share/solr/scripts/snappuller
/usr/share/solr/scripts/snappuller-disable
/usr/share/solr/scripts/snappuller-enable
/usr/share/solr/scripts/snapshooter
/usr/share/solr/admin
/usr/share/solr/admin/_info.jsp
/usr/share/solr/admin/action.jsp
/usr/share/solr/admin/analysis.jsp
/usr/share/solr/admin/analysis.xsl
/usr/share/solr/admin/distributiondump.jsp
/usr/share/solr/admin/favicon.ico
/usr/share/solr/admin/form.jsp
/usr/share/solr/admin/get-file.jsp
/usr/share/solr/admin/get-properties.jsp
/usr/share/solr/admin/header.jsp
/usr/share/solr/admin/index.jsp
/usr/share/solr/admin/jquery-1.2.3.min.js
/usr/share/solr/admin/meta.xsl
/usr/share/solr/admin/ping.jsp
/usr/share/solr/admin/ping.xsl
/usr/share/solr/admin/raw-schema.jsp
/usr/share/solr/admin/registry.jsp
/usr/share/solr/admin/registry.xsl
/usr/share/solr/admin/replication
/usr/share/solr/admin/replication/header.jsp
/usr/share/solr/admin/replication/index.jsp
/usr/share/solr/admin/schema.jsp
/usr/share/solr/admin/solr-admin.css
/usr/share/solr/admin/solr_small.png
/usr/share/solr/admin/stats.jsp
/usr/share/solr/admin/stats.xsl
/usr/share/solr/admin/tabular.xsl
/usr/share/solr/admin/threaddump.jsp
/usr/share/solr/admin/threaddump.xsl
/usr/share/solr/admin/debug.jsp
/usr/share/solr/admin/dataimport.jsp
/usr/share/solr/favicon.ico
/usr/share/solr/index.jsp
/usr/share/doc
/usr/share/doc/solr-common
/usr/share/doc/solr-common/changelog.Debian.gz
/usr/share/doc/solr-common/README.Debian
/usr/share/doc/solr-common/TODO.Debian
/usr/share/doc/solr-common/copyright
/usr/share/doc/solr-common/changelog.gz
/usr/share/doc/solr-common/NOTICE.txt.gz
/usr/share/doc/solr-common/README.txt.gz
/var
/var/lib
/var/lib/solr
/var/lib/solr/data
/usr/share/solr/WEB-INF/lib/xml-apis.jar
/usr/share/solr/WEB-INF/lib/xml-apis-ext.jar
/usr/share/solr/WEB-INF/lib/slf4j-jdk14.jar
/usr/share/solr/WEB-INF/lib/slf4j-api.jar
/usr/share/solr/WEB-INF/lib/lucene-spellchecker.jar
/usr/share/solr/WEB-INF/lib/lucene-snowball.jar
/usr/share/solr/WEB-INF/lib/lucene-queries.jar
/usr/share/solr/WEB-INF/lib/lucene-highlighter.jar
/usr/share/solr/WEB-INF/lib/lucene-core.jar
/usr/share/solr/WEB-INF/lib/lucene-analyzers.jar
/usr/share/solr/WEB-INF/lib/jetty-util.jar
/usr/share/solr/WEB-INF/lib/jetty.jar
/usr/share/solr/WEB-INF/lib/commons-io.jar
/usr/share/solr/WEB-INF/lib/commons-httpclient.jar
/usr/share/solr/WEB-INF/lib/commons-fileupload.jar
/usr/share/solr/WEB-INF/lib/commons-csv.jar
/usr/share/solr/WEB-INF/lib/commons-codec.jar
/usr/share/solr/WEB-INF/web.xml
/usr/share/solr/conf

If i reckon correctly some parts of apache solr will not work with the ubuntu 
lucid distribution.

http://solr.dkd.local/update/extract
 throws an error:

The server encountered an internal error (lazy loading error
org.apache.solr.common.SolrException: lazy loading error at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
at

Maybe someone from ubuntu reading this list can confirm this.

Olivier
--

Olivier Dobberkau

d.k.d Internet Service GmbH
Kaiserstraße 73
60329 Frankfurt/Main

mail: olivier.dobber...@dkd.de
web: http://www.dkd.de


Re: Interesting stuff; Solr as a syslog store.

2010-02-12 Thread Olivier Dobberkau

Am 13.02.2010 um 03:02 schrieb Antonio Lobato:

 Just thought this would be a neat story to share with you all.  I've really 
 grown to love Solr, it's something else!

Hi Antonio,

Great.

Would you also share the source code somewhere! 
May the Source be with you. 

Thanks.

Olivier




Re: How to set User.dir or CWD for Solr during Tomcat startup

2010-01-07 Thread Olivier Dobberkau

Am 07.01.2010 um 00:07 schrieb Turner, Robbin J:

 I've been doing a bunch of googling and haven't seen if there is a parameter 
 to set within Tomcat other than the solr/home which is setup in the solr.xml 
 under the $CATALINA_HOME/conf/Catalina/localhost/.

Hi.

We set this in solr.xml

Context docBase=/opt/solr-tomcat/apache-tomcat-6.0.20/webapps/solr.war 
debug=0 crossContext=true 
   Environment name=solr/home type=java.lang.String 
value=/opt/solr-tomcat/solr override=true /
/Context

http://wiki.apache.org/solr/SolrTomcat#Simple_Example_Install

hope this helps.

olivier

--

Olivier Dobberkau
. . . . . . . . . . . . . .
Je TYPO3, desto d.k.d



RE: why no results?

2009-12-08 Thread Jaco Olivier
Hi Regan,

I am using STRING fields only for values that in most cases will be used
to FACET on..
I suggest using TEXT fields as per the default examples...

ALSO, remember that if you do not specify the 
solr.LowerCaseFilterFactory  that your search has just become case
sensitive.. I struggled with that one before, so make sure what you are
indexing is what you are searching for.
* Stick to the default examples that is provided with the SOLR distro
and you should be fine.

fieldType name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
!-- Case insensitive stop word removal.
 enablePositionIncrements=true ensures that a 'gap' is left
to
 allow for accurate phrase queries.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

Jaco Olivier

-Original Message-
From: regany [mailto:re...@newzealand.co.nz] 
Sent: 08 December 2009 06:15
To: solr-user@lucene.apache.org
Subject: Re: why no results?



Tom Hill-7 wrote:
 
 Try solr.TextField instead.
 


Thanks Tom,

I've replaced the types section above with...

types
fieldtype name=string class=solr.TextField
sortMissingLast=true omitNorms=true /
/types


deleted my index, restarted Solr and re-indexed my documents - but the
search still returns nothing.

Do I need to change the type in the fields sections as well?

regan
-- 
View this message in context:
http://old.nabble.com/why-no-results--tp26688249p26688469.html
Sent from the Solr - User mailing list archive at Nabble.com.

Please consider the environment before printing this email. This 
transmission is for the intended addressee only and is confidential 
information. If you have received this transmission in error, please 
delete it and notify the sender. The content of this e-mail is the 
opinion of the writer only and is not endorsed by Sabinet Online Limited 
unless expressly stated otherwise.


RE: why no results?

2009-12-08 Thread Jaco Olivier
Hi,

Try changing your TEXT field to type text
field name=text type=text indexed=true
stored=false multiValued=true / (without the  of course :))

That is your problem... also use the text type as per default examples
with SOLR distro :)

Jaco Olivier


-Original Message-
From: regany [mailto:re...@newzealand.co.nz] 
Sent: 08 December 2009 05:44
To: solr-user@lucene.apache.org
Subject: why no results?


hi all - newbie solr question - I've indexed some documents and can
search /
receive results using the following schema - BUT ONLY when searching on
the
id field. If I try searching on the title, subtitle, body or text
field I
receive NO results. Very confused. :confused: Can anyone see anything
obvious I'm doing wrong Regan.



?xml version=1.0 ?

schema name=core0 version=1.1

types
fieldtype name=string class=solr.StrField
sortMissingLast=true omitNorms=true /
/types

 fields
!-- general --
field  name=id type=string indexed=true stored=true
multiValued=false required=true /
field name=title type=string indexed=true stored=true
multiValued=false /
field name=subtitle type=string indexed=true
stored=true
multiValued=false /
field name=body type=string indexed=true stored=true
multiValued=false /
field name=text type=string indexed=true stored=false
multiValued=true /
 /fields

 !-- field to use to determine and enforce document uniqueness. --
 uniqueKeyid/uniqueKey

 !-- field for the QueryParser to use when an explicit fieldname is
absent
--
 defaultSearchFieldtext/defaultSearchField

 !-- SolrQueryParser configuration: defaultOperator=AND|OR --
 solrQueryParser defaultOperator=OR/

 !-- copyFields group fields into one single searchable indexed field
for
speed.  --
copyField source=title dest=text /
copyField source=subtitle dest=text /
copyField source=body dest=text /

/schema

-- 
View this message in context:
http://old.nabble.com/why-no-results--tp26688249p26688249.html
Sent from the Solr - User mailing list archive at Nabble.com.

Please consider the environment before printing this email. This 
transmission is for the intended addressee only and is confidential 
information. If you have received this transmission in error, please 
delete it and notify the sender. The content of this e-mail is the 
opinion of the writer only and is not endorsed by Sabinet Online Limited 
unless expressly stated otherwise.


RE: do copyField's need to exist as Fields?

2009-12-08 Thread Jaco Olivier
Hi Regan,

Something I noticed on your setup...
The ID field in your setup I assume to be your uniqueID for the book or
journal (The ISSN or something)
Try making this a string as TEXT is not the ideal field to use for
unique IDs

field  name=id type=string indexed=true stored=true
multiValued=false required=true /

Congrats on figuring out SOLR fields - I suggest getting the SOLR 1.4
Book.. It really saved me a 1000 questions on this mailing list :)

Jaco Olivier

-Original Message-
From: regany [mailto:re...@newzealand.co.nz] 
Sent: 09 December 2009 00:48
To: solr-user@lucene.apache.org
Subject: Re: do copyField's need to exist as Fields?



regany wrote:
 
 Is there a different way I should be setting it up to achieve the
above??
 


Think I figured it out.

I set up the fields so they are present, but get ignored accept for
the
text field which gets indexed...

field  name=id type=text indexed=true stored=true
multiValued=false required=true /
field name=title stored=false indexed=false multiValued=true
type=text /
field name=subtitle stored=false indexed=false multiValued=true
type=text /
field name=body stored=false indexed=false multiValued=true
type=text /
field name=text type=text indexed=true stored=false
multiValued=true /

and then copyField the first 4 fields to the text field:

copyField source=id dest=text /
copyField source=title dest=text /
copyField source=subtitle dest=text /
copyField source=body dest=text /


Seems to be working!? :drunk:
-- 
View this message in context:
http://old.nabble.com/do-copyField%27s-need-to-exist-as-Fields--tp267017
06p26702224.html
Sent from the Solr - User mailing list archive at Nabble.com.

Please consider the environment before printing this email. This 
transmission is for the intended addressee only and is confidential 
information. If you have received this transmission in error, please 
delete it and notify the sender. The content of this e-mail is the 
opinion of the writer only and is not endorsed by Sabinet Online Limited 
unless expressly stated otherwise.


Re: i want to use something like *query* similar to database - %query% like search

2009-12-02 Thread Olivier Dobberkau

Am 02.12.2009 um 09:55 schrieb amittripathi:

 its accepting the trailing wildcard character but solr is not accepting the
 leading wildcard character

The Error message says it all.

'*' or '?' not allowed as first character in WildcardQuery 

solr is not SQL.

Olivier

--

Olivier Dobberkau


Re: Showcase: Facetted Search for Wine using Solr

2009-09-28 Thread Olivier Dobberkau

Marian Steinbach schrieb:

On Sat, Sep 26, 2009 at 3:22 AM, Lance Norskog goks...@gmail.com wrote:
  

Have you seen this? It is another Solr/Typeo3 integration project.

http://forge.typo3.org/projects/show/extension-solr

Would you consider open-sourcing your Solr/Typo3 integration?




Hi Lance!

I wasn't aware of that extension. Having looked at the website, it
does something very different from what we did. The solr extension
mentioned above tries to provide a better website search for the Typo3
CMS on top of Solr.

Our integration doesn't index web pages but product data from an XML
file. I'd say the implementation is pretty much customer-specific so
that I don't see a real benefit of making it open source.

Regards,

Marian
  


hi marian.
our extension will be able to do see also once we have set up the 
indexing queue for the typo3 backend.
we have a concept called typo3 extensions connectors so that you will be 
able to add index documents to your index.
feel free to contact ingo about the contribution possibililies in our 
solr project.
if you use open source software you shoud definitly contribute. this 
gives you great karma.

or as we at typo3 say. inspire people to share!

olivier


Re: filtering facets

2009-08-31 Thread Olivier H. Beauchesne

Hi Mike,

No, my problem is that the field article_outlinks is multivalued thus it 
contains several urls not related to my search. I would like to facet 
only urls matching my query.


For exemple(only on one document, but my search targets over 1M docs):

Doc1:
article_url:
url1.com/1
url2.com/2
url1.com/1
url1.com/3

And my query is: article_url:url1.com* and I facet by article_url and I 
want it to give me:

url1.com/1 (2)
url1.com/3 (1)

But right now, because url2.com/2 is contained in a multivalued field 
with the matching urls, I get this:

url1.com/1 (2)
url1.com/3 (1)
url2.com/2 (1)

I can use facet.prefix to filter, but it's not very flexible if my url 
contains a subdomain as facet.prefix doesn't support wildcards.


Thank you,

Olivier

Mike Topper a écrit :

Hi Olivier,

are the facet counts on the urls you dont want 0?

if so you can use facet.mincount to only return results greater than 0.

-Mike

Olivier H. Beauchesne wrote:
  

Hi,

Long time lurker, first time poster.

I have a multi-valued field, let's call it article_outlinks containing
all outgoing urls from a document. I want to get all matching urls
sorted by counts.

For exemple, I want to get all outgoing wikipedia url in my documents
sorted by counts.

So I execute a query like this:
q=article_outlinks:http*wikipedia.org*  and I facet on article_outlinks

But I get facets containing the other urls in the documents. I can get
something close by using facet.prefix=http://en.wikipedia.org but I
want to include other subdomains on wikipedia (ex: fr.wikipedia.org).

Is there a way to do a search and getting facets only matching my query?

I know facet.prefix isn't a query, but is there a way to get that
behavior?

Is it easy to extend solr to do something like that?

Thank you,

Olivier

Sorry for my english.





  


Re: filtering facets

2009-08-31 Thread Olivier H. Beauchesne
yeah, but then I would have to retrieve *a lot* of facets. I think for 
now i'll retrieve all the subdomains with facet.prefix and then merge 
those queries. Not ideal, but when I will have more motivation, I will 
submit a patch to solr :-)


Michael a écrit :

You could post-process the response and remove urls that don't match your
domain pattern.

On Mon, Aug 31, 2009 at 9:45 AM, Olivier H. Beauchesne oliv...@olihb.comwrote:

  

Hi Mike,

No, my problem is that the field article_outlinks is multivalued thus it
contains several urls not related to my search. I would like to facet only
urls matching my query.

For exemple(only on one document, but my search targets over 1M docs):

Doc1:
article_url:
url1.com/1
url2.com/2
url1.com/1
url1.com/3

And my query is: article_url:url1.com* and I facet by article_url and I
want it to give me:
url1.com/1 (2)
url1.com/3 (1)

But right now, because url2.com/2 is contained in a multivalued field with
the matching urls, I get this:
url1.com/1 (2)
url1.com/3 (1)
url2.com/2 (1)

I can use facet.prefix to filter, but it's not very flexible if my url
contains a subdomain as facet.prefix doesn't support wildcards.

Thank you,

Olivier

Mike Topper a écrit :

 Hi Olivier,


are the facet counts on the urls you dont want 0?

if so you can use facet.mincount to only return results greater than 0.

-Mike

Olivier H. Beauchesne wrote:


  

Hi,

Long time lurker, first time poster.

I have a multi-valued field, let's call it article_outlinks containing
all outgoing urls from a document. I want to get all matching urls
sorted by counts.

For exemple, I want to get all outgoing wikipedia url in my documents
sorted by counts.

So I execute a query like this:
q=article_outlinks:http*wikipedia.org*  and I facet on article_outlinks

But I get facets containing the other urls in the documents. I can get
something close by using facet.prefix=http://en.wikipedia.org but I
want to include other subdomains on wikipedia (ex: fr.wikipedia.org).

Is there a way to do a search and getting facets only matching my query?

I know facet.prefix isn't a query, but is there a way to get that
behavior?

Is it easy to extend solr to do something like that?

Thank you,

Olivier

Sorry for my english.







  


  


filtering facets

2009-08-30 Thread Olivier H. Beauchesne

Hi,

Long time lurker, first time poster.

I have a multi-valued field, let's call it article_outlinks containing 
all outgoing urls from a document. I want to get all matching urls 
sorted by counts.


For exemple, I want to get all outgoing wikipedia url in my documents 
sorted by counts.


So I execute a query like this:
q=article_outlinks:http*wikipedia.org*  and I facet on article_outlinks

But I get facets containing the other urls in the documents. I can get 
something close by using facet.prefix=http://en.wikipedia.org but I want 
to include other subdomains on wikipedia (ex: fr.wikipedia.org).


Is there a way to do a search and getting facets only matching my query?

I know facet.prefix isn't a query, but is there a way to get that behavior?

Is it easy to extend solr to do something like that?

Thank you,

Olivier

Sorry for my english.


Re: Solr CMS Integration

2009-08-07 Thread Olivier Dobberkau


Am 07.08.2009 um 19:01 schrieb wojtekpia:

I've been asked to suggest a framework for managing a website's  
content and
making all that content searchable. I'm comfortable using Solr for  
search,

but I don't know where to start with the content management system. Is
anyone using a CMS (open source or commercial) that you've  
integrated with
Solr for search and are happy with? This will be a consumer facing  
website

with a combination or articles, blogs, white papers, etc.



Hi Wojtek,

Have a look at TYPO3. http://typo3.org/
It is quite powerful.
Ingo and I are currently implementing a SOLR extension for it.
We currently use it at http://www.be-lufthansa.com/
Contact me if you want an insight.

Many greetings,

Olivier


--
Olivier Dobberkau
. . . . . . . . . . . . . .
Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstrasse 73
D 60329 Frankfurt/Main

Fon:  +49 (0)69 - 247 52 18 - 0
Fax:  +49 (0)69 - 247 52 18 - 99

Mail: olivier.dobber...@dkd.de
Web: http://www.dkd.de

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer: Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

Aktuelle Projekte:
http://bewegung.taz.de - Launch (Ruby on Rails)
http://www.hans-im-glueck.de - Relaunch (TYPO3)
http://www.proasyl.de - Relaunch (TYPO3)



Re: Best approach to multiple languages

2009-07-22 Thread Olivier Dobberkau


Am 22.07.2009 um 18:31 schrieb Ed Summers:


In case you are curious I've attached a copy of our schema.xml to give
you an idea of what we did.



Thanks for sharing!

--
Olivier Dobberkau


Apachecon 2009 Europe

2009-03-27 Thread Olivier Dobberkau

Hi all,

you came back with a head full of impressions from Apachecon Europe.
Thanks a lot for the great Speeches and the inspiring personal talks.

I strongly believe that solr will have great future.

Olivier

--
Olivier Dobberkau
d.k.d Internet Service GmbH
fon:  +49 (0)69 - 43 05 61-70 fax:  +49 (0)69 - 43 05 61-90
mail: olivier.dobber...@dkd.de home: http://www.dkd.de


Re: Severe errors in solr configuration

2009-02-05 Thread Olivier Dobberkau


Am 05.02.2009 um 12:07 schrieb Anto Binish Kaspar:


Do I need to give some permissions to the folder?



i would guess so.

Olivier
--
Olivier Dobberkau

Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstr. 79
D 60329 Frankfurt/Main

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer:
Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

fon:  +49 (0)69 - 43 05 61-70
fax:  +49 (0)69 - 43 05 61-90
mail: olivier.dobber...@dkd.de
home: http://www.dkd.de

aktuelle TYPO3-Projekte:
www.licht.de - Relaunch (TYPO3)
www.lahmeyer.de - Launch (TYPO3)
www.seb-assetmanagement.de - Relaunch (TYPO3)


Re: Severe errors in solr configuration

2009-02-04 Thread Olivier Dobberkau


Am 04.02.2009 um 13:33 schrieb Anto Binish Kaspar:


Hi,
I am trying to configure solr on ubuntu server and I am getting the  
following exception. I can able work it on windows box.



Hi Anto.

Have you installed the solr package 1.2 from ubuntu?
Or the release 1.3 as war file?

Olivier

--
Olivier Dobberkau

Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstr. 79
D 60329 Frankfurt/Main

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer:
Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

fon:  +49 (0)69 - 43 05 61-70
fax:  +49 (0)69 - 43 05 61-90
mail: olivier.dobber...@dkd.de
home: http://www.dkd.de

aktuelle TYPO3-Projekte:
www.licht.de - Relaunch (TYPO3)
www.lahmeyer.de - Launch (TYPO3)
www.seb-assetmanagement.de - Relaunch (TYPO3)


Re: Severe errors in solr configuration

2009-02-04 Thread Olivier Dobberkau


Am 04.02.2009 um 13:54 schrieb Anto Binish Kaspar:


Hi Olivier

Thanks for your quick reply. I am using the release 1.3 as war file.

- Anto Binish Kaspar


OK.
As far a i understood you need to make sure that your solr home is set.
this needs to be done in

Quting:

http://wiki.apache.org/solr/SolrTomcat

In addition to using the default behavior of relying on the Solr Home  
being in the current working directory (./solr) you can alternately  
add the solr.solr.home system property to your JVM settings before  
starting Tomcat...


export JAVA_OPTS=$JAVA_OPTS -Dsolr.solr.home=/my/custom/solr/home/dir/

...or use a Context file to configure the Solr Home using JNDI

A Tomcat context fragments can be used to configure the JNDI property  
needed to specify your Solr Home directory.


Just put a context fragment file under $CATALINA_HOME/conf/Catalina/ 
localhost that looks something like this...


$ cat /tomcat55/conf/Catalina/localhost/solr.xml

Context docBase=/some/path/solr.war debug=0 crossContext=true 
   Environment name=solr/home type=java.lang.String value=/my/ 
solr/home override=true /

/Context

Greetings,

Olivier

PS: May be it would be great if we could provide an ubuntu dpkg with  
1.3 ? Any takers?


--
Olivier Dobberkau

Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstr. 79
D 60329 Frankfurt/Main

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer:
Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

fon:  +49 (0)69 - 43 05 61-70
fax:  +49 (0)69 - 43 05 61-90
mail: olivier.dobber...@dkd.de
home: http://www.dkd.de

aktuelle TYPO3-Projekte:
www.licht.de - Relaunch (TYPO3)
www.lahmeyer.de - Launch (TYPO3)
www.seb-assetmanagement.de - Relaunch (TYPO3)


Re: Severe errors in solr configuration

2009-02-04 Thread Olivier Dobberkau

A slash?

Olivier

Von meinem iPhone gesendet


Am 04.02.2009 um 14:06 schrieb Anto Binish Kaspar antobin...@ec.is:


I am using Context file, here is my solr.xml

$ cat /var/lib/tomcat6/conf/Catalina/localhost/solr.xml

Context docBase=/usr/local/solr/solr-1.3/solr.war
debug=0 crossContext=true 
Environment name=/solr/home type=java.lang.String value=usr/ 
local/solr/solr-1.3/solr override=true /

/Context

I change the ownership of the folder (usr/local/solr/solr-1.3/solr)  
to tomcat6:tomcat6 from root:root


Anything I am missing?

- Anto Binish Kaspar


-Original Message-
From: Olivier Dobberkau [mailto:olivier.dobber...@dkd.de]
Sent: Wednesday, February 04, 2009 6:30 PM
To: solr-user@lucene.apache.org
Subject: Re: Severe errors in solr configuration


Am 04.02.2009 um 13:54 schrieb Anto Binish Kaspar:


Hi Olivier

Thanks for your quick reply. I am using the release 1.3 as war file.

- Anto Binish Kaspar


OK.
As far a i understood you need to make sure that your solr home is  
set.

this needs to be done in

Quting:

http://wiki.apache.org/solr/SolrTomcat

In addition to using the default behavior of relying on the Solr Home
being in the current working directory (./solr) you can alternately
add the solr.solr.home system property to your JVM settings before
starting Tomcat...

export JAVA_OPTS=$JAVA_OPTS -Dsolr.solr.home=/my/custom/solr/home/ 
dir/


...or use a Context file to configure the Solr Home using JNDI

A Tomcat context fragments can be used to configure the JNDI property
needed to specify your Solr Home directory.

Just put a context fragment file under $CATALINA_HOME/conf/Catalina/
localhost that looks something like this...

$ cat /tomcat55/conf/Catalina/localhost/solr.xml

Context docBase=/some/path/solr.war debug=0 crossContext=true 
   Environment name=solr/home type=java.lang.String value=/my/
solr/home override=true /
/Context

Greetings,

Olivier

PS: May be it would be great if we could provide an ubuntu dpkg with
1.3 ? Any takers?

--
Olivier Dobberkau

Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstr. 79
D 60329 Frankfurt/Main

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer:
Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

fon:  +49 (0)69 - 43 05 61-70
fax:  +49 (0)69 - 43 05 61-90
mail: olivier.dobber...@dkd.de
home: http://www.dkd.de

aktuelle TYPO3-Projekte:
www.licht.de - Relaunch (TYPO3)
www.lahmeyer.de - Launch (TYPO3)
www.seb-assetmanagement.de - Relaunch (TYPO3)



Re: Severe errors in solr configuration

2009-02-04 Thread Olivier Dobberkau


Am 04.02.2009 um 15:50 schrieb Anto Binish Kaspar:

Yes I removed, still I have the same issue. Any idea what may be  
cause of this issue?



Have you solved your problem?

Olivier
--
Olivier Dobberkau

Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstr. 79
D 60329 Frankfurt/Main

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer:
Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

fon:  +49 (0)69 - 43 05 61-70
fax:  +49 (0)69 - 43 05 61-90
mail: olivier.dobber...@dkd.de
home: http://www.dkd.de

aktuelle TYPO3-Projekte:
www.licht.de - Relaunch (TYPO3)
www.lahmeyer.de - Launch (TYPO3)
www.seb-assetmanagement.de - Relaunch (TYPO3)