[Libreoffice-bugs] [Bug 44681] EasyHack: port to CLucene from java/Lucene ...

2012-02-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=44681

Caolán McNamara caol...@redhat.com changed:

   What|Removed |Added

 Status|REOPENED|ASSIGNED
 AssignedTo|libreoffice-b...@lists.free |caol...@redhat.com
   |desktop.org |

--- Comment #6 from Caolán McNamara caol...@redhat.com 2012-02-20 13:12:39 
PST ---
set this as assigned as its in-progress

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 44681] EasyHack: port to CLucene from java/Lucene ...

2012-02-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=44681

Michael Meeks michael.me...@novell.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||DUPLICATE

--- Comment #4 from Michael Meeks michael.me...@novell.com 2012-02-17 
13:20:54 PST ---
so marking a dup of the remove stdlibs bug.

*** This bug has been marked as a duplicate of bug 46246 ***

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 44681] EasyHack: port to CLucene from java/Lucene ...

2012-02-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=44681

Michael Meeks michael.me...@novell.com changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|DUPLICATE   |

--- Comment #5 from Michael Meeks michael.me...@novell.com 2012-02-17 
13:21:44 UTC ---
urk - terribly sorry, wrong bug ...

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 44681] EasyHack: port to CLucene from java/Lucene ...

2012-02-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=44681

--- Comment #2 from Caolán McNamara caol...@redhat.com 2012-02-10 07:28:53 
PST ---
oh shiny, that's *very* encouraging. With a bit of luck this could take hours 
off my multi-language build times :-)

Well first off I reckon its best to mail your code-to-date and your question 
again to the general development list libreoff...@lists.freedesktop.org to get
better and wider feedback

but here's my guesses. 

 * Lucene 2.3 is used, but the CLucene stable version is Lucene 1.9.1
   compatible. The developers recommend using the Git version, which *is*
   compatible with 2.3. Is that OK?

probably yeah

 * How exactly is HelpIndexerTool used? I believe both as a command-line tool
   (as part of a ??? - HelpLinker - HelpIndexer - ???) chain in the build
   process, and as a run-time component to index help for extensions. Is it
   desired to keep HelpIndexer as a stand-alone command-line tool, or is that
   just because it is a Java component currently?

I think its desirable to be a standalone command-line tool. It gets used when 
building the helpcontent2 module, which is really slow for lots of enabled
languages.

   Currently, I ported most of the first part (for Japanese there is a special
   Analyzer, which I don't know how to test, and there are a bunch of options
   to check certain things, of which I'm not sure whether they're ever used)

The CJKAnalyzer comes with the java lucene to its a special one, but not a 
custom one belonging to us, *presumably* this means that the lucene world knows 
any
potential gotchas with trying to convert uses of it to clucene, I'm not exactly 
sure what it does over the generic one, but we've got some Japanese readers who
should be able to read the final output of a conversion to see if the quality 
is sufficient.

   Question: does creating the ZIP need to be part of this? If so, what is the
   best way to create the archive?

back in the day I the last time I convert the original java HelpLinker to c++ I 
*cough* just spawned off perl to do the zipping, e.g. see
JarOutputStream::JarOutputStream in 
http://people.redhat.com/caolanm/ooocvs/workspace.helplinker01.patch could grab 
and re-use that. In the longer run we might
expose some more stuff from package/inc to export out a simple zip api, but 
using (silly) JarOutputStream would do for now


 * I'm assuming that CLucene will *always* be compiled with TCHAR defined as
   wchar_t. This is because of my ignorance of how one does portable wide 
   strings in LibreOffice. Please enlighten me.

presumably this will just work, FWIW we have an 8bit code unit rtl::OString 
and a UTF-16 rtl::OUString class in LibreOffice, not sure if we need to bridge
from these to whatever CLucene uses at any point, but I'm sure its doable if 
necessary.

 * How to incorporate the CLucene dependency in the build process?

its sort of tricky to do this, but plenty of examples, e.g. see the libwpd or 
libcdr or hunspell dirs which are special modules that build extra dependencies.
Basically don't worry about this bit, get it converted to clucene and with some 
luck someone else will handle figuring out how to build clucene itself as part
of our build

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 44681] EasyHack: port to CLucene from java/Lucene ...

2012-02-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=44681

--- Comment #3 from Gert van Valkenhoef g.h.m.van.valkenh...@rug.nl 
2012-02-10 09:31:05 PST ---
Thanks for the comments. I'll look into your suggestions and then send the next 
version to the general list.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 44681] EasyHack: port to CLucene from java/Lucene ...

2012-02-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=44681

--- Comment #1 from Gert van Valkenhoef g.h.m.van.valkenh...@rug.nl 
2012-02-09 13:01:26 PST ---
Created attachment 56833
  -- https://bugs.freedesktop.org/attachment.cgi?id=56833
Partial implementation (indexing)

I wrote a C++ port of the Java-based HelpIndexerTool (or, actually, only the
part that does the indexing using Lucene, with a minimal main() that just
indexes a hard-coded test directory). Before I go forward, there are a number
of questions about how this will be used and what are the constraints etc.:

 * Lucene 2.3 is used, but the CLucene stable version is Lucene 1.9.1
   compatible. The developers recommend using the Git version, which *is*
   compatible with 2.3. Is that OK?

 * How exactly is HelpIndexerTool used? I believe both as a command-line tool
   (as part of a ??? - HelpLinker - HelpIndexer - ???) chain in the build
   process, and as a run-time component to index help for extensions. Is it
   desired to keep HelpIndexer as a stand-alone command-line tool, or is that
   just because it is a Java component currently?

 * HelpIndexerTool does two things:

- Index help files using Lucene, producing intermediate files

- Bundle the intermediate files into a ZIP archive

   Currently, I ported most of the first part (for Japanese there is a special
   Analyzer, which I don't know how to test, and there are a bunch of options
   to check certain things, of which I'm not sure whether they're ever used).

   Question: does creating the ZIP need to be part of this? If so, what is the
   best way to create the archive?

 * HelpFileDocument is a simple support class that produces a Lucene Document
   for a given help file. This is fully ported (as the helpDocument method).

 * I'm assuming that CLucene will *always* be compiled with TCHAR defined as
   wchar_t. This is because of my ignorance of how one does portable wide 
   strings in LibreOffice. Please enlighten me.

 * How to incorporate the CLucene dependency in the build process?

The attached code is contributed under under the LGPLv3+ / MPL.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 44681] EasyHack: port to CLucene from java/Lucene ...

2012-01-11 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=44681

Michael Meeks michael.me...@novell.com changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
  Status Whiteboard||EasyHack,DifficultyInterest
   ||ing,SkillCpp,TopicCleanup
 Ever Confirmed|0   |1

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs