Re: SOLR 1.2 - Duplicate Documents??

2007-11-07 Thread Ryan McKinley


Schema.xml
 field name=id type=string indexed=true stored=true/


Have you edited schema.xml since building a full index from scratch?  If 
so, try rebuilding the index.


People often get the behavior you describe if the 'id' is a 'text' field.

ryan



Re: SOLR 1.2 - Duplicate Documents??

2007-11-07 Thread Chris Hostetter
: Hey all, I have a fairly odd case of duplicate documents in our solr index
: (See attached xml sample). THe index is roughtly 35k in documents. The only

How did you index those documents?  

Any chance you inadvertently set the allowDups=true attribute when 
sending them to Solr (possibly becuase of an option whose meaning you 
didn't fully understand in solrj or solr-ruby etc...)

?




-Hoss



MultiCore unregister

2007-11-07 Thread John Reuning
For the MultiCore experts, is there an acceptable or approved way to 
close and unregister a single SolrCore?  I'm interested in stopping 
cores, manipulating the solr directory tree, and reregistering them.


Thanks,

-John R.


Search Multiple indexes In Solr

2007-11-07 Thread j 90
Hi, I'm new to Solr but very familiar with Lucene.

Is there a way to have Solr search in more than once index, much like the
MultiSearcher in Lucene ?

If so how so I configure the location of the indexes ?


Re: SOLR 1.2 - Duplicate Documents??

2007-11-07 Thread realw5

I haven't made any changes to the schema since the intial full-index. Do you
know if there is a way to rebuild the full index in the background, without
having to take down the current live index?

Dan



ryantxu wrote:
 
 
 Schema.xml
  field name=id type=string indexed=true stored=true/
 
 Have you edited schema.xml since building a full index from scratch?  If 
 so, try rebuilding the index.
 
 People often get the behavior you describe if the 'id' is a 'text' field.
 
 ryan
 
 
 

-- 
View this message in context: 
http://www.nabble.com/SOLR-1.2---Duplicate-Documents---tf4762687.html#a13629639
Sent from the Solr - User mailing list archive at Nabble.com.



Re: start.jar -Djetty.port= not working

2007-11-07 Thread Mike Davies
Hi Brian,

Found the SVN location, will download from there and give it a try.

Thanks for the help.



On 07/11/2007, Mike Davies [EMAIL PROTECTED] wrote:

 I'm using 1.2, downloaded from

 http://apache.rediris.es/lucene/solr/

 Where can i get the trunk version?




 On 07/11/2007, Brian Whitman [EMAIL PROTECTED] wrote:
 
 
  On Nov 7, 2007, at 10:00 AM, Mike Davies wrote:
   java -Djetty.port=8521 -jar start.jar
  
   However when I run this it seems to ignore the command and still
   start on
   the default port of 8983.  Any suggestions?
  
 
  Are you using trunk solr or 1.2? I believe 1.2 still shipped with an
  older version of jetty that doesn't follow the new-style CL
  arguments. I just tried it on trunk and it worked fine for me.
 
 
 
 
 
 
 
  --
  http://variogr.am/
  [EMAIL PROTECTED]
 
 
 
 



Re: start.jar -Djetty.port= not working

2007-11-07 Thread Brian Whitman


On Nov 7, 2007, at 10:00 AM, Mike Davies wrote:

java -Djetty.port=8521 -jar start.jar

However when I run this it seems to ignore the command and still  
start on

the default port of 8983.  Any suggestions?



Are you using trunk solr or 1.2? I believe 1.2 still shipped with an  
older version of jetty that doesn't follow the new-style CL  
arguments. I just tried it on trunk and it worked fine for me.








--
http://variogr.am/
[EMAIL PROTECTED]





Re: Can you parse the contents of a field to populate other fields?

2007-11-07 Thread Yonik Seeley
On 11/6/07, Kristen Roth [EMAIL PROTECTED] wrote:
 Yonik - thanks so much for your help!  Just to clarify; where should the
 regex go for each field?

Each field should have a different FieldType (referenced by the type
XML attribute).  Each fieldType can have it's own analyzer.  You can
use a different PatternTokenizer (which specifies a regex) for each
analyzer.

-Yonik


Re: Sorting problem

2007-11-07 Thread Ryan McKinley


Does anyone know what could be the problem?



looks like it was a problem in the new query parser.  I just fixed it in 
trunk:

http://svn.apache.org/viewvc?view=revrevision=592740

Yonik - do we want to keep this checking for 'null', or should we change 
QueryParser.parseSort( ) to always return a valid sortSpec?


ryan


Re: Sorting problem

2007-11-07 Thread Ryan McKinley

Yonik Seeley wrote:

On 11/7/07, Ryan McKinley [EMAIL PROTECTED] wrote:

Yonik - do we want to keep this checking for 'null', or should we change
QueryParser.parseSort( ) to always return a valid sortSpec?


In Lucene, a null sort is not equal to score desc... they result in
the same documents being returned, but the former takes a different
code path and is faster.



right, but solr QueryParsing.SortSpec holds a lucene Sort  -- in either 
case the lucene Sort object is null.


Since num  offset were added to SortSpec, it can't be null anymore (i 
don't think)


ryan


Re: highlight and wildcards ?

2007-11-07 Thread Kamran Shadkhast

I fixed this problem by returning thisreturn  super.getPrefixQuery(field,
termStr);
in  solr.search.SolrQueryParser and it worked for me.

-Kamran

Mike Klaas wrote:
 
 On 7-Jun-07, at 5:27 PM, Frédéric Glorieux wrote:
 
 Hoss,

 Thanks for all your information and pointers. I know that my  
 problems are not mainstream.
 
 Have you tried commenting out getPrefixQuery in  
 solr.search.SolrQueryParser?  It should then revert to a regular  
 lucene prefix query.
 
 -Mike
 

-- 
View this message in context: 
http://www.nabble.com/highlight-and-wildcards---tf3883191.html#a13632571
Sent from the Solr - User mailing list archive at Nabble.com.



Simple sorting questions

2007-11-07 Thread Ronald K. Braun
Pardon the basicness of these questions, but I'm just getting started
with SOLR and have a couple of confusions regarding sorting that I
couldn't resolve based on the docs or an archive search.

1. There appears to be (at least) two ways to specify sorting, one
involving an append to the q parm and the other using the sort parm.
Are these exactly equivalent?

   http://localhost/solr/select/?q=martha;author+asc
   http://localhost/solr/select/?q=marthasort=author+asc

2. The docs say that sorting can only be applied to non-multivalued
fields.  Does this mean that sorting won't work *at all* for
multi-valued fields or only that the behaviour is indeterminate?
Based on a brief test, sorting a multi-valued field appeared to work
by picking an arbitrary value when multiple values are present and
using that for the sort.  I wanted to confirm that the expected
behaviour is indeed to sort on something (with no guarantees as to
what), as opposed to, say, dropping the record, putting the record
with multi-values at the end with the missing valued records, or
something else entirely.

Thanks!

Ron


RE: Can you parse the contents of a field to populate other fields?

2007-11-07 Thread Kristen Roth
So, I think I have things set up correctly in my schema, but it doesn't
appear that any logic is being applied to my Category_# fields - they
are being populated with the full string copied from the Category field
(facet1::facet2::facet3...facetn) instead of just facet1, facet2, etc.

I have several different field types, each with a different regex to
match a specific part of the input string.  In this example, I'm
matching facet1 in input string facet1::facet2::facet3...facetn

fieldtype name=cat1str class=solr.TextField
analyzer type=index
tokenizer class=solr.PatternTokenizerFactory
pattern=^([^:]+) group=1/
/analyzer 
/fieldtype

I have copyfields set up for each Category_# field.  Anything obviously
wrong?

Thanks!
Kristen

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Wednesday, November 07, 2007 9:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Can you parse the contents of a field to populate other
fields?

On 11/6/07, Kristen Roth [EMAIL PROTECTED] wrote:
 Yonik - thanks so much for your help!  Just to clarify; where should
the
 regex go for each field?

Each field should have a different FieldType (referenced by the type
XML attribute).  Each fieldType can have it's own analyzer.  You can
use a different PatternTokenizer (which specifies a regex) for each
analyzer.

-Yonik


Re: how to use PHP AND PHPS?

2007-11-07 Thread Dave Lewis


On Nov 7, 2007, at 2:04 AM, James liu wrote:

i just decrease answer information...and u will see my result(full,  
not

part)

*before unserialize*

string(433)
a:2:{s:14:responseHeader;a:3:{s:6:status;i:0;s:5:QTime;i: 
0;s:6:params;a:7:{s:2:fl;s:5:Title;s:6:indent;s:2:on;s: 
5:start;s:1:0;s:1:q;s:1:2;s:2:wt;s:4:phps;s:4:rows;a: 
2:{i:0;s:1:2;i:1;s:2:10;}s:7:version;s:3:
2.2;}}s:8:response;a:3:{s:8:numFound;i:28;s:5:start;i:0;s: 
4:docs;a:2:{i:0;a:1:{s:5:Title;d:诺基亚N-Gage基本数据;}i:1;a:1: 
{s:5:Title;d:索尼爱立信P908基本数据;


*after unserialize...*
bool(false)



and i write serialize test code..

?php

$ar = array (
array('id' = 123, 'Title'= 中文测试),
array('id' = 123, 'Title'= 中国上海),
);

echo serialize($ar);

?




and result is :



a:2:{i:0;a:2:{s:2:id;i:123;s:5:Title;s:12:中文测试;}i:1;a:2: 
{s:2:id;i:123;s:5:Title;s:12:中国上海;}}






*php* result is:

string(369) array( 'responseHeader'=array( 'status'=0, 'QTime'=0,
'params'=array( 'fl'='Title', 'indent'='on', 'start'='0',  
'q'='2',

'wt'='php', 'rows'=array('2', '10'), 'version'='2.2')),
'response'=array('numFound'=28,'start'=0,'docs'=array( array(
'Title'=诺基亚N-Gage基本数据), array( 'Title'=索尼爱立信P908基本数 
据)) ))


it is string, so i can't read it correctly by php.





This part (after string(369)) is exactly what it you should be seeing  
if you use the php handler, and it's what you get after you  
unserialize when using phps.


You can access your search results as:

$solrResults['response']['docs'];

In your example above, that would be:

array( array('Title'=诺基亚N-Gage基本数据), array( 'Title'=索尼爱立信 
P908基本数据))


When using the php handler, you must do something like this:

eval('$solrResults = ' .$serializedSolrResults . ';');

Then, as above, you can access $solrResults['response']['docs'].

To sum up, if you use phps, you must unserialize the results.  If you  
use php, you must eval the results (including some sugar to get a  
variable set to that value).



dave




RE: Analysis / Query problem

2007-11-07 Thread Wagner,Harry
Thanks Erik.  That helps.

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 07, 2007 11:36 AM
To: solr-user@lucene.apache.org
Subject: Re: Analysis / Query problem


On Nov 7, 2007, at 10:26 AM, Wagner,Harry wrote:
 I have the following custom field defined for author names.   After  
 indexing the 2 documents below the admin analysis tool looks right  
 for field-name=au and field-value=Schröder, Jürgen   The highlight  
 matching also seems right.  However, if I search for au:Schröder,  
 Jürgen using the admin tool I do not get any hits (see below).   
 This appears to be the case whenever there are 2 non-ascii  
 characters in the author name.  Searching for au:Schröder, Jurgen  
 finds both of these records.  Any idea what is causing this?


 response

 lst name=responseHeader

 int name=status0/int

 int name=QTime0/int

 lst name=params

 str name=indenton/str

 str name=start0/str

 str name=qau:Schröder, Jürgen/str

One thing to note is that query au:Schröder, Jürgen is being  
translated (try debugQuery=true to see) to:

au:schröder  AND/OR defaultField:jürgen

AND/OR depends on how you have things configured, as well as the  
default field.

You probably want to use the ISOLatin1AccentFilterFactory to have the  
diacritics flattened to the ASCII character they look like.

Erik




Re: Analysis / Query problem

2007-11-07 Thread Erik Hatcher


On Nov 7, 2007, at 10:26 AM, Wagner,Harry wrote:
I have the following custom field defined for author names.   After  
indexing the 2 documents below the admin analysis tool looks right  
for field-name=au and field-value=Schröder, Jürgen   The highlight  
matching also seems right.  However, if I search for au:Schröder,  
Jürgen using the admin tool I do not get any hits (see below).   
This appears to be the case whenever there are 2 non-ascii  
characters in the author name.  Searching for au:Schröder, Jurgen  
finds both of these records.  Any idea what is causing this?





response

lst name=responseHeader

int name=status0/int

int name=QTime0/int

lst name=params

str name=indenton/str

str name=start0/str

str name=qau:Schröder, Jürgen/str


One thing to note is that query au:Schröder, Jürgen is being  
translated (try debugQuery=true to see) to:


au:schröder  AND/OR defaultField:jürgen

AND/OR depends on how you have things configured, as well as the  
default field.


You probably want to use the ISOLatin1AccentFilterFactory to have the  
diacritics flattened to the ASCII character they look like.


Erik




Re: start.jar -Djetty.port= not working

2007-11-07 Thread Brian Whitman



On Nov 7, 2007, at 10:07 AM, Mike Davies wrote:

I'm using 1.2, downloaded from

http://apache.rediris.es/lucene/solr/

Where can i get the trunk version?


svn, or http://people.apache.org/builds/lucene/solr/nightly/




restricting search to a set of documents

2007-11-07 Thread briand

I need to perform a search against a limited set of documents.  I have the
set of document ids, but was wondering what is the best way to formulate the
query to SOLR? 
-- 
View this message in context: 
http://www.nabble.com/restricting-search-to-a-set-of-documents-tf4767801.html#a13637479
Sent from the Solr - User mailing list archive at Nabble.com.



Re: restricting search to a set of documents

2007-11-07 Thread Mike Klaas

On 7-Nov-07, at 2:27 PM, briand wrote:



I need to perform a search against a limited set of documents.  I  
have the
set of document ids, but was wondering what is the best way to  
formulate the

query to SOLR?


add fq=docId:(id1 id2 id3 id4 id5...)

cheers,
-Mike


unsubscribe

2007-11-07 Thread Jeryl Cook
Jeryl Cook  /^\ Pharaoh /^\  http://pharaohofkush.blogspot.com/  ..Act your 
age, and not your shoe size.. -Prince(1986)

 From: [EMAIL PROTECTED] Subject: Re: start.jar -Djetty.port= not working 
 Date: Wed, 7 Nov 2007 10:13:22 -0500 To: solr-user@lucene.apache.org
 On Nov 7, 2007, at 10:07 AM, Mike Davies wrote:  I'm using 1.2, downloaded 
 from   http://apache.rediris.es/lucene/solr/   Where can i get 
 the trunk version?  svn, or 
 http://people.apache.org/builds/lucene/solr/nightly/  

Re: What is the best way to index xml data preserving the mark up?

2007-11-07 Thread Walter Underwood
If you really, really need to preserve the XML structure, you'll
be doing a LOT of work to make Solr do that. It might be cheaper
to start with software that already does that. I recommend
MarkLogic -- I know the principals there, and it is some seriously
fine software. Not free or open, but very, very good.

If your problem can be expressed in a flat field model, then the
your problem is mapping your document model into Solr. You might
be able to use structured field names to represent the XML context,
but that is just a guess.

With a mixed corpus of XML and arbitrary text, requiring special
handling of XML, yow, that's a lot of work.

One thought -- you can do flat fields in an XML engine (like MarkLogic)
much more easily than you can do XML in a flat field engine (like Lucene).

wunder

On 11/7/07 8:18 PM, David Neubert [EMAIL PROTECTED] wrote:

 I am sure this is 101 question, but I am bit confused about indexing xml data
 using SOLR.
 
 I have rich xml content (books) that need to searched at granular levels
 (specifically paragraph and sentence levels very accurately, no
 approximations).  My source text has exact p/p and s/s tags for this
 purpose.  I have built this app in previous versions (using other search
 engines) indexing the text twice, (1) where every paragraph was a virtual
 document and (2) where every sentence was a virtual document  -- both
 extracted from the source file (which was a singe xml file for the entire
 book).  I have of course thought about using an XML engine eXists or Xindices,
 but I am prefer to the stability and user base and performance that
 Lucene/SOLR seems to have, and also there is a large body of text that is
 regular documents and not well formed XML as well.
 
 I am brand new to SOLR (one day) and at a basic level understand SOLR's nice
 simple xml scheme to add documents:
 
 add
   doc
 field name=foo1foo value 1/field
 field name=foo2foo value 2/field
   /doc
   doc.../doc
 /add
 
 But my problem is that I believe I need to perserve the xml markup at the
 paragraph and sentence levels, so I was hoping to create a content field that
 could just contain the source xml for the paragraph or sentence respectively.
 There are reasons for this that I won't go into -- alot of granular work in
 this app, accessing pars and sens.
 
 Obviously an XML mechanism that could leverage the xml structure (via XPath or
 XPointers) would work great.  Still I think Lucene can do this in a field
 level way-- and I also can't imagine that users who are indexing XML documents
 have to go through the trouble of striping all the markup before indexing?
 Hopefully I missing something basic?
 
 It would be great to pointed in the right direction on this matter?
 
 I think I need something along this line:
 
 add
   doc
 field name=foo1value 1/field
 field name=foo2value 2/field
 
 field name=contentan xml stream with embedded source markup/field
   /doc
 /add
 
 Maybe the overall question -- is what is the best way to index XML content
 using SOLR -- is all this tag stripping really necessary?
 
 Thanks for any help,
 
 Dave
 
 
 
 
 
 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around
 http://mail.yahoo.com 



What is the best way to index xml data preserving the mark up?

2007-11-07 Thread David Neubert
I am sure this is 101 question, but I am bit confused about indexing xml data 
using SOLR.

I have rich xml content (books) that need to searched at granular levels 
(specifically paragraph and sentence levels very accurately, no 
approximations).  My source text has exact p/p and s/s tags for this 
purpose.  I have built this app in previous versions (using other search 
engines) indexing the text twice, (1) where every paragraph was a virtual 
document and (2) where every sentence was a virtual document  -- both extracted 
from the source file (which was a singe xml file for the entire book).  I have 
of course thought about using an XML engine eXists or Xindices, but I am prefer 
to the stability and user base and performance that Lucene/SOLR seems to have, 
and also there is a large body of text that is regular documents and not well 
formed XML as well.

I am brand new to SOLR (one day) and at a basic level understand SOLR's nice 
simple xml scheme to add documents:

add
  doc
field name=foo1foo value 1/field
field name=foo2foo value 2/field
  /doc
  doc.../doc
/add

But my problem is that I believe I need to perserve the xml markup at the 
paragraph and sentence levels, so I was hoping to create a content field that 
could just contain the source xml for the paragraph or sentence respectively.  
There are reasons for this that I won't go into -- alot of granular work in 
this app, accessing pars and sens.

Obviously an XML mechanism that could leverage the xml structure (via XPath or 
XPointers) would work great.  Still I think Lucene can do this in a field level 
way-- and I also can't imagine that users who are indexing XML documents have 
to go through the trouble of striping all the markup before indexing?  
Hopefully I missing something basic?

It would be great to pointed in the right direction on this matter?

I think I need something along this line:

add
  doc
field name=foo1value 1/field
field name=foo2value 2/field

field name=contentan xml stream with embedded source markup/field
  /doc
/add

Maybe the overall question -- is what is the best way to index XML content 
using SOLR -- is all this tag stripping really necessary?

Thanks for any help,

Dave





__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: how to use PHP AND PHPS?

2007-11-07 Thread James liu
hmm

i find error,,,that is my error not about php and phps ..

i use old config to testso config have a problem..

that is Title i use double as its type...it should use text.


On Nov 8, 2007 10:29 AM, James liu [EMAIL PROTECTED] wrote:

  php now is ok..

 but phps failed

 mycode:

  ?php
  $url = 
  'http://localhost:8080/solr1/select/?q=2version=2.2rows=2fl=Titlestart=0rows=10indent=onwt=phps
  ';
  $a = file_get_contents($url);
  //eval('$solrResults = ' .$serializedSolrResults . ';');
  echo 'bbefore unserialize/bbr/';
  var_dump($a);
  echo 'Br/';
  $a = unserialize($a);
  echo 'bafter unserialize.../bbr/';
  var_dump($a);
  ?
 

 and result:

  *before unserialize*
  string(434)
  a:2:{s:14:responseHeader;a:3:{s:6:status;i:0;s:5:QTime;i:32;s:6:params;a:7:{s:2:fl;s:5:Title;s:6:indent;s:2:on;s:5:start;s:1:0;s:1:q;s:1:2;s:2:wt;s:4:phps;s:4:rows;a:2:{i:0;s:1:2;i:1;s:2:10;}s:7:version;s:3:
  2.2;}}s:8:response;a:3:{s:8:numFound;i:28;s:5:start;i:0;s:4:docs;a:2:{i:0;a:1:{s:5:Title;d:诺基亚N-Gage基本数据;}i:1;a:1:{s:5:Title;d:索尼爱立信P908基本数据;
 
  *after unserialize...*
  bool(false)
 


 On Nov 7, 2007 9:30 PM, Dave Lewis [EMAIL PROTECTED] wrote:

 
  On Nov 7, 2007, at 2:04 AM, James liu wrote:
 
   i just decrease answer information...and u will see my result(full,
   not
   part)
  
   *before unserialize*
   string(433)
   a:2:{s:14:responseHeader;a:3:{s:6:status;i:0;s:5:QTime;i:
   0;s:6:params;a:7:{s:2:fl;s:5:Title;s:6:indent;s:2:on;s:
   5:start;s:1:0;s:1:q;s:1:2;s:2:wt;s:4:phps;s:4:rows;a:
   2:{i:0;s:1:2;i:1;s:2:10;}s:7:version;s:3:
   2.2;}}s:8:response;a:3:{s:8:numFound;i:28;s:5:start;i:0;s:
   4:docs;a:2:{i:0;a:1:{s:5:Title;d:诺基亚N-Gage基本数据;}i:1;a:1:
   {s:5:Title;d:索尼爱立信P908基本数据;
  
   *after unserialize...*
   bool(false)
  
  
   and i write serialize test code..
  
   ?php
   $ar = array (
   array('id' = 123, 'Title'= 中文测试),
   array('id' = 123, 'Title'= 中国上海),
   );
  
   echo serialize($ar);
  
   ?
  
  
  
   and result is :
  
  
   a:2:{i:0;a:2:{s:2:id;i:123;s:5:Title;s:12:中文测试;}i:1;a:2:
   {s:2:id;i:123;s:5:Title;s:12:中国上海;}}
  
  
  
  
   *php* result is:
  
   string(369) array( 'responseHeader'=array( 'status'=0, 'QTime'=0,
   'params'=array( 'fl'='Title', 'indent'='on', 'start'='0',
   'q'='2',
   'wt'='php', 'rows'=array('2', '10'), 'version'='2.2')),
   'response'=array('numFound'=28,'start'=0,'docs'=array( array(
   'Title'=诺基亚N-Gage基本数据), array( 'Title'=索尼爱立信P908基本数
   据)) ))
  
   it is string, so i can't read it correctly by php.
  
  
 
 
  This part (after string(369)) is exactly what it you should be seeing
  if you use the php handler, and it's what you get after you
  unserialize when using phps.
 
  You can access your search results as:
 
  $solrResults['response']['docs'];
 
  In your example above, that would be:
 
  array( array('Title'=诺基亚N-Gage基本数据), array( 'Title'=索尼爱立信
  P908基本数据))
 
  When using the php handler, you must do something like this:
 
  eval('$solrResults = ' .$serializedSolrResults . ';');
 
  Then, as above, you can access $solrResults['response']['docs'].
 
  To sum up, if you use phps, you must unserialize the results.  If you
  use php, you must eval the results (including some sugar to get a
  variable set to that value).
 
 
  dave
 
 
 


 --
 regards
 jl




-- 
regards
jl


Re: how to use PHP AND PHPS?

2007-11-07 Thread James liu
 php now is ok..

but phps failed

mycode:

 ?php
 $url = '
 http://localhost:8080/solr1/select/?q=2version=2.2rows=2fl=Titlestart=0rows=10indent=onwt=phps
 ';
 $a = file_get_contents($url);
 //eval('$solrResults = ' .$serializedSolrResults . ';');
 echo 'bbefore unserialize/bbr/';
 var_dump($a);
 echo 'Br/';
 $a = unserialize($a);
 echo 'bafter unserialize.../bbr/';
 var_dump($a);
 ?


and result:

 *before unserialize*
 string(434)
 a:2:{s:14:responseHeader;a:3:{s:6:status;i:0;s:5:QTime;i:32;s:6:params;a:7:{s:2:fl;s:5:Title;s:6:indent;s:2:on;s:5:start;s:1:0;s:1:q;s:1:2;s:2:wt;s:4:phps;s:4:rows;a:2:{i:0;s:1:2;i:1;s:2:10;}s:7:version;s:3:
 2.2;}}s:8:response;a:3:{s:8:numFound;i:28;s:5:start;i:0;s:4:docs;a:2:{i:0;a:1:{s:5:Title;d:诺基亚N-Gage基本数据;}i:1;a:1:{s:5:Title;d:索尼爱立信P908基本数据;

 *after unserialize...*
 bool(false)



On Nov 7, 2007 9:30 PM, Dave Lewis [EMAIL PROTECTED] wrote:


 On Nov 7, 2007, at 2:04 AM, James liu wrote:

  i just decrease answer information...and u will see my result(full,
  not
  part)
 
  *before unserialize*
  string(433)
  a:2:{s:14:responseHeader;a:3:{s:6:status;i:0;s:5:QTime;i:
  0;s:6:params;a:7:{s:2:fl;s:5:Title;s:6:indent;s:2:on;s:
  5:start;s:1:0;s:1:q;s:1:2;s:2:wt;s:4:phps;s:4:rows;a:
  2:{i:0;s:1:2;i:1;s:2:10;}s:7:version;s:3:
  2.2;}}s:8:response;a:3:{s:8:numFound;i:28;s:5:start;i:0;s:
  4:docs;a:2:{i:0;a:1:{s:5:Title;d:诺基亚N-Gage基本数据;}i:1;a:1:
  {s:5:Title;d:索尼爱立信P908基本数据;
 
  *after unserialize...*
  bool(false)
 
 
  and i write serialize test code..
 
  ?php
  $ar = array (
  array('id' = 123, 'Title'= 中文测试),
  array('id' = 123, 'Title'= 中国上海),
  );
 
  echo serialize($ar);
 
  ?
 
 
 
  and result is :
 
 
  a:2:{i:0;a:2:{s:2:id;i:123;s:5:Title;s:12:中文测试;}i:1;a:2:
  {s:2:id;i:123;s:5:Title;s:12:中国上海;}}
 
 
 
 
  *php* result is:
 
  string(369) array( 'responseHeader'=array( 'status'=0, 'QTime'=0,
  'params'=array( 'fl'='Title', 'indent'='on', 'start'='0',
  'q'='2',
  'wt'='php', 'rows'=array('2', '10'), 'version'='2.2')),
  'response'=array('numFound'=28,'start'=0,'docs'=array( array(
  'Title'=诺基亚N-Gage基本数据), array( 'Title'=索尼爱立信P908基本数
  据)) ))
 
  it is string, so i can't read it correctly by php.
 
 


 This part (after string(369)) is exactly what it you should be seeing
 if you use the php handler, and it's what you get after you
 unserialize when using phps.

 You can access your search results as:

 $solrResults['response']['docs'];

 In your example above, that would be:

 array( array('Title'=诺基亚N-Gage基本数据), array( 'Title'=索尼爱立信
 P908基本数据))

 When using the php handler, you must do something like this:

 eval('$solrResults = ' .$serializedSolrResults . ';');

 Then, as above, you can access $solrResults['response']['docs'].

 To sum up, if you use phps, you must unserialize the results.  If you
 use php, you must eval the results (including some sugar to get a
 variable set to that value).


 dave





-- 
regards
jl


Re: What is the best way to index xml data preserving the mark up?

2007-11-07 Thread Norberto Meijome
On Wed, 7 Nov 2007 20:18:25 -0800 (PST)
David Neubert [EMAIL PROTECTED] wrote:

 I am sure this is 101 question, but I am bit confused about indexing xml data 
 using SOLR.
 
 I have rich xml content (books) that need to searched at granular levels 
 (specifically paragraph and sentence levels very accurately, no 
 approximations).  My source text has exact p/p and s/s tags for this 
 purpose.  I have built this app in previous versions (using other search 
 engines) indexing the text twice, (1) where every paragraph was a virtual 
 document and (2) where every sentence was a virtual document  -- both 
 extracted from the source file (which was a singe xml file for the entire 
 book).  I have of course thought about using an XML engine eXists or 
 Xindices, but I am prefer to the stability and user base and performance that 
 Lucene/SOLR seems to have, and also there is a large body of text that is 
 regular documents and not well formed XML as well.
 
 I am brand new to SOLR (one day) and at a basic level understand SOLR's nice 
 simple xml scheme to add documents:
 
 add
   doc
 field name=foo1foo value 1/field
 field name=foo2foo value 2/field
   /doc
   doc.../doc
 /add
 
 But my problem is that I believe I need to perserve the xml markup at the 
 paragraph and sentence levels, so I was hoping to create a content field that 
 could just contain the source xml for the paragraph or sentence respectively. 
  There are reasons for this that I won't go into -- alot of granular work in 
 this app, accessing pars and sens.
 
 Obviously an XML mechanism that could leverage the xml structure (via XPath 
 or XPointers) would work great.  Still I think Lucene can do this in a field 
 level way-- and I also can't imagine that users who are indexing XML 
 documents have to go through the trouble of striping all the markup before 
 indexing?  Hopefully I missing something basic?
 
 It would be great to pointed in the right direction on this matter?
 
 I think I need something along this line:
 
 add
   doc
 field name=foo1value 1/field
 field name=foo2value 2/field
 
 field name=contentan xml stream with embedded source markup/field
   /doc
 /add
 
 Maybe the overall question -- is what is the best way to index XML content 
 using SOLR -- is all this tag stripping really necessary?

crazy/silly idea maybe... could you use dynamic fields, each containing a 
sentence, and a reference to the paragraph it belongs to ? 
eg, (not sure if the syntax is correct..)

dynamicField name=s_* type=string /

Then when you create your document you can define
doc
  field name=s_1_p1{Sentence #1, Para#1}/field
  field name=s_2_p1{Sentence #2, Para#1}/field
  field name=s_3_p1{Sentence #3, Para#1}/field
  field name=s_1_p2{Sentence #1, Para#2}/field
[...]
/doc

I have no idea how scalable that would be. 
cheers,
B
_
{Beto|Norberto|Numard} Meijome

Immediate success shouldn't be necessary as a motivation to do the right thing.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Can you parse the contents of a field to populate other fields?

2007-11-07 Thread George Everitt
I'm not sure I fully understand your ultimate goal or Yonik's  
response.  However, in the past I've been able to represent  
hierarchical data as a simple enumeration of delimited paths:


field name=taxonomyroot/field
field name=taxonomyroot/region/field
field name=taxonomyroot/region/north america/field
field name=taxonomyroot/region/south america/field

Then, at response time, you can walk the result facet and build a  
hierarchy with counts that can be put into a tree view.  The tree can  
be any arbitrary depth, and documents can live in any combination of  
nodes on the tree.


In addition, you can represent any arbitrary name value pair  
(attribute/tuple) as a two level tree.   That way, you can put any  
combination of attributes in the facet and parse them out at results  
list time.  For example, you might be indexing computer hardware.
Memory, Bus Speed and Resolution may be valid for some objects but not  
for others.   Just put them in a facet and specify a separator:


field name=attributememory:1GB/name
field name=attributebusspeed:133Mhz/name
field name=attributevoltage:110/220/name
field name=attributemanufacturer:Shiangtsu/field


When you do a facet query, you can easily display the categories  
appropriate to the object.  And do facet selections like show me all  
green things and show me all size 4 things.



Even if that's not your goal, this might help someone else.


George Everitt







On Nov 7, 2007, at 3:15 PM, Kristen Roth wrote:

So, I think I have things set up correctly in my schema, but it  
doesn't

appear that any logic is being applied to my Category_# fields - they
are being populated with the full string copied from the Category  
field

(facet1::facet2::facet3...facetn) instead of just facet1, facet2, etc.

I have several different field types, each with a different regex to
match a specific part of the input string.  In this example, I'm
matching facet1 in input string facet1::facet2::facet3...facetn

   fieldtype name=cat1str class=solr.TextField
analyzer type=index
tokenizer class=solr.PatternTokenizerFactory
pattern=^([^:]+) group=1/
/analyzer
   /fieldtype

I have copyfields set up for each Category_# field.  Anything  
obviously

wrong?

Thanks!
Kristen

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Wednesday, November 07, 2007 9:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Can you parse the contents of a field to populate other
fields?

On 11/6/07, Kristen Roth [EMAIL PROTECTED] wrote:

Yonik - thanks so much for your help!  Just to clarify; where should

the

regex go for each field?


Each field should have a different FieldType (referenced by the type
XML attribute).  Each fieldType can have it's own analyzer.  You can
use a different PatternTokenizer (which specifies a regex) for each
analyzer.

-Yonik





Timeout in remote streaming

2007-11-07 Thread Guangwei Yuan
Hi,

I'm sending a local csv file to Solr via remote streaming, and constantly
get the 500 read timeout message. The csv file is about 200MB in size, and
Solr is running on Tomcat 5.5. What types of timeout related Tomcat params I
can adjust to fix this?

Thanks in advance.

- Guangwei


Re: What is the best way to index xml data preserving the mark up?

2007-11-07 Thread David Neubert
Thanks Walter -- 

I am aware of MarkLogic -- and agree -- but I have a very low budget on 
licensed software in this case (near 0) -- 

have you used eXists or Xindices? 

Dave

- Original Message 
From: Walter Underwood [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, November 7, 2007 11:37:38 PM
Subject: Re: What is the best way to index xml data preserving the mark up?

If you really, really need to preserve the XML structure, you'll
be doing a LOT of work to make Solr do that. It might be cheaper
to start with software that already does that. I recommend
MarkLogic -- I know the principals there, and it is some seriously
fine software. Not free or open, but very, very good.

If your problem can be expressed in a flat field model, then the
your problem is mapping your document model into Solr. You might
be able to use structured field names to represent the XML context,
but that is just a guess.

With a mixed corpus of XML and arbitrary text, requiring special
handling of XML, yow, that's a lot of work.

One thought -- you can do flat fields in an XML engine (like MarkLogic)
much more easily than you can do XML in a flat field engine (like
 Lucene).

wunder

On 11/7/07 8:18 PM, David Neubert [EMAIL PROTECTED] wrote:

 I am sure this is 101 question, but I am bit confused about indexing
 xml data
 using SOLR.
 
 I have rich xml content (books) that need to searched at granular
 levels
 (specifically paragraph and sentence levels very accurately, no
 approximations).  My source text has exact p/p and s/s tags
 for this
 purpose.  I have built this app in previous versions (using other
 search
 engines) indexing the text twice, (1) where every paragraph was a
 virtual
 document and (2) where every sentence was a virtual document  -- both
 extracted from the source file (which was a singe xml file for the
 entire
 book).  I have of course thought about using an XML engine eXists or
 Xindices,
 but I am prefer to the stability and user base and performance that
 Lucene/SOLR seems to have, and also there is a large body of text
 that is
 regular documents and not well formed XML as well.
 
 I am brand new to SOLR (one day) and at a basic level understand
 SOLR's nice
 simple xml scheme to add documents:
 
 add
   doc
 field name=foo1foo value 1/field
 field name=foo2foo value 2/field
   /doc
   doc.../doc
 /add
 
 But my problem is that I believe I need to perserve the xml markup at
 the
 paragraph and sentence levels, so I was hoping to create a content
 field that
 could just contain the source xml for the paragraph or sentence
 respectively.
 There are reasons for this that I won't go into -- alot of granular
 work in
 this app, accessing pars and sens.
 
 Obviously an XML mechanism that could leverage the xml structure (via
 XPath or
 XPointers) would work great.  Still I think Lucene can do this in a
 field
 level way-- and I also can't imagine that users who are indexing XML
 documents
 have to go through the trouble of striping all the markup before
 indexing?
 Hopefully I missing something basic?
 
 It would be great to pointed in the right direction on this matter?
 
 I think I need something along this line:
 
 add
   doc
 field name=foo1value 1/field
 field name=foo2value 2/field
 
 field name=contentan xml stream with embedded source
 markup/field
   /doc
 /add
 
 Maybe the overall question -- is what is the best way to index XML
 content
 using SOLR -- is all this tag stripping really necessary?
 
 Thanks for any help,
 
 Dave
 
 
 
 
 
 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around
 http://mail.yahoo.com 






__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: MultiCore unregister

2007-11-07 Thread John Reuning
I was hoping that a feature was lurking about and not yet added to the 
patch.  How about something like this?  Should it throw an exception if 
the core isn't found in the map?


Thanks,

-jrr


--- MultiCore.java.orig 2007-11-07 23:09:32.0 -0500
+++ MultiCore.java  2007-11-07 23:14:08.0 -0500
@@ -125,6 +125,25 @@
 }
   }

+  /**
+   * Stop and unregister a core of the given name
+   *
+   * @param name
+   */
+  public void shutdown ( String name )
+  {
+if ( name == null || name.length() == 0 ) {
+  throw new RuntimeException(Invalid core name.);
+}
+synchronized ( cores ) {
+  SolrCore core = cores.get(name);
+  if ( core != null ) {
+cores.remove(name);
+core.close();
+  }
+}
+  }
+
   @Override
   protected void finalize() {
 shutdown();


Ryan McKinley wrote:

Nothing yet... but check:
https://issues.apache.org/jira/browse/SOLR-350

ryan


John Reuning wrote:
For the MultiCore experts, is there an acceptable or approved way to 
close and unregister a single SolrCore?  I'm interested in stopping 
cores, manipulating the solr directory tree, and reregistering them.


Thanks,

-John R.