Licensing Statement for LibreOffice contributions

2012-05-10 Thread Steven Butler
Hi,

All of my past  future contributions to LibreOffice may be licensed under
the MPL/LGPLv3+ dual license.

-- 
Regards,
Steven Butler
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] building index at Win32 install time ...

2011-02-19 Thread Steven Butler
Hi All,

I had a bit of time to work on the install time index generation, so
I've created a patch that implements a custom action for the MSI
installer which I think will do just that once hooked in.

As yet it's untested inside MSI as I'm not sure which files to edit in
the MSI templates to patch it in ( I have never developed anything for
MSI before).

I did test it by pulling it into a command line tool that I observed
to generate all the required index files when pointed at a pseudo-
libreoffice install with a set of dict-xx directories.

I wasn't sure what the best approach was to finding out which
dictionaries need indexing as I'm not sure what can be gained out of
the MSI database.  In the end I looked at a bunch of the other custom
actions and decided to just do a mini-find from the LibreOffice
install path for the dictionary extensions and generate all the idx
files for any dat file found.  I thought it may be possible to do a
query on the MSI database and determine all the .dat files.

Assuming this is a reasonable approach, what remains is to hook the
custom action into the MSI installer process and then test it a bit.

-- 
Regards,
Steven Butler


0001-Add-MSI-custom-action-capable-of-generating-thesauru.patch
Description: Binary data
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] building index at Win32 install time ...

2011-02-19 Thread Steven Butler
Hi Again,

I forgot to add this change to have it build the custom action.

All changes are MPL/LGPL as per standard libreoffice licensing terms.

Regards
Steven Butler


0001-MSI-ensure-thesaurus-indexer-custom-action-is-built.patch
Description: Binary data
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


[Libreoffice] [PATCH] Some build breakage fixes for MSVC2008

2011-02-12 Thread Steven Butler
Hi all,

I was finally able to get a successful build and package to complete
on Windows using MSVC2008 Express.

I was not able to do this on my windows 7 64 bit machine but this may
be due to the diversity of development environments I had previously
installed on it.

I worked around the problem by using a Windows XP Mode VM to do a
fresh install of the tool chain.

I had to make the following small patches to get the build to work.

sw_compile.patch does a nasty hack to make a member function mutable
because it is referenced all over the place using non const iterators.

I didn't want to break anything so I took an easy out and made it
mutable but I would think it would be better fixed by using const_cast
wherever a non-const iterator is created.

sw_const_fix.patch addresses a couple of const_iterator issues in unochart.cxx.

unopkg_app_stl_fix.patch fixes apparent breakage in the use of fill_n
- my reference says it returns void but the original code was
expecting an iterator returned.  I imagine it must be an STLport
extension, but it is broken with the built in STL and I believe the
fix is correct.

Regards
Steven Butler


sw_compile.patch
Description: Binary data


sw_const_fix.patch
Description: Binary data


unopkg_app_stl_fix.patch
Description: Binary data
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


[Libreoffice] [PUSHED] Re: [PATCH] Some build breakage fixes for MSVC2008

2011-02-12 Thread Steven Butler
Hi Fridrich,

 On Sat, 2011-02-12 at 20:54 +1000, Steven Butler wrote:

 unopkg_app_stl_fix.patch fixes apparent breakage in the use of fill_n

 Thanks for your precious work. BTW, do the smoketests run for you? Here
 they run, but there is a crash on exit :(

I had to disable some unit tests to get through the build as
cppunittester was crashing on exit.  I'm not used to dev on windows so
I couldn't figure out how to get any trace out of the crash.  It may
have been a crash on exit because after I sent off my mail I found my
build of libreoffice crashes in every application when you quit, but
otherwise appears stable in use - I mainly focused on the areas I had
patched in sw with the document list heading and numbering levels and
didn't actually quit it until I'd sent off the patches.

I didn't run the make check target.  My main goal over the past 2
weeks was just to get a windows build to build.

Regards
Steven Butler
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] win32 build errors in svtools modules

2011-02-06 Thread Steven Butler
On 6 February 2011 15:32, Steven Butler sebut...@gmail.com wrote:

 I will see how much further into the build I can get now svtools is building.


I am now trying to build framework and it all compiles okay but I'm
getting really strange undefined symbol errors that I can't figure
out.

   Creating library ../wntmsci12.pro/lib/ifwi.lib and object
../wntmsci12.pro/lib/ifwi.exp
ifwi.exp : error LNK2001: unresolved external symbol _real@3f80
ifwi.exp : error LNK2001: unresolved external symbol _real@41efffe0
ifwi.exp : error LNK2001: unresolved external symbol _real@41f0
../wntmsci12.pro/bin/fwimi.dll : fatal error LNK1120: 3 unresolved externals
dmake:  Error code 2, while making '../wntmsci12.pro/bin/fwimi.dll'

I haven't been able to figure out where these symbols come from but
they're defined as R in a number of files in the slo directory and are
present in items like converter.obj

I tried compiling with build debug=true after removing the
wntmsci12.pro directory but I don't think it gave me any more
information - I don't see any debug symbols.

Does anyone have any idea what these symbols are and where they should
be defined, or failing that how to get one or two of the obj files to
build with symbols.

Is there a method for tracing these symbols back to their source.
Normally the symbols have a bit more meaning than _real.

I have built the same git c091d6adaf73e913264a890665ce2315c002851b
commit of framework under Ubuntu without issue and there's no sign of
these odd _real@XXX symbols.

Regards
Steven Butler
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] win32 build errors in svtools modules

2011-02-06 Thread Steven Butler
On 6 February 2011 21:28, Steven Butler sebut...@gmail.com wrote:

   Creating library ../wntmsci12.pro/lib/ifwi.lib and object
 ../wntmsci12.pro/lib/ifwi.exp
 ifwi.exp : error LNK2001: unresolved external symbol _real@3f80
 ifwi.exp : error LNK2001: unresolved external symbol _real@41efffe0
 ifwi.exp : error LNK2001: unresolved external symbol _real@41f0
 ../wntmsci12.pro/bin/fwimi.dll : fatal error LNK1120: 3 unresolved externals
 dmake:  Error code 2, while making '../wntmsci12.pro/bin/fwimi.dll'

I used a process of elimination in classes/converter.cxx to find out
which method was inserting these symbols.

It is this one below, and the symbol seems to come from the
OUStringHash used because without a local variable of type
OUStringHash in the file I don't see the strange _real@xxx symbol.

I am still mystified as to why this is happening, especially since
there is a function directly above it that also uses OUStringHash (as
an input parameter) and doesn't cause the same thing!

OUStringHash Converter::convert_seqProp2OUStringHash( const
css::uno::Sequence css::beans::PropertyValue  lSource )
{
OUStringHash lDestination;
sal_Int32 nCount  = lSource.getLength();
const css::beans::PropertyValue* pSource = lSource.getConstArray();
for (sal_Int32 nItem=0; nItemnCount; ++nItem)
{
pSource[nItem].Value = lDestination[pSource[nItem].Name];
}
return lDestination;
}

Regards
Steven Butler
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] Problems with windows build

2011-02-05 Thread Steven Butler
Hi Jesús

2011/2/5 Jesús Corrius je...@softcatala.org:
 The stack trace of the crash was the one Tor mentioned previously. So
 I guess it's just a matter of fixing this and we can announce support
 for this compiler.

I ended up uninstalling VS2010 and it seems to have built now (but I
have other issues now).

Thanks for your help.

Regards
Steve
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] win32 build errors in svtools modules

2011-02-05 Thread Steven Butler
On 5 February 2011 21:46, Steven Butler sebut...@gmail.com wrote:

 I am stuck on the following error when trying to build svtools.  My
 previous issue with saxparser and climaker seems to have disappeared
 after uninstalling VS2010 C# Express and pulling in latest git.

 _STL::listclass Link,class _STL::allocatorclass Link
::push_back(class Link const )

Replacing the use of std::list in htmlcfg.cxx with std::vector
resolved the multiply defined symbol error.

I have no idea why this should help as I would have thought using the
same template instantiation in two different files would not cause an
issue but apparently it does for MSVC2008.  At least for me.

I will see how much further into the build I can get now svtools is building.

Regards
Steve
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] Problems with windows build

2011-02-04 Thread Steven Butler
 Hi Steven,

 Have you tried --enable-graphite=no? That ought to work.


I ended up reenabling graphite and setting the stdext to be std, and
commenting out the namespace alias.

That made it work for that.  I have a couple of modules that are still
failing to build.  I'll re-fetch and try again.

Regards
Steven Butler
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] Problems with windows build

2011-02-04 Thread Steven Butler
On 4 February 2011 22:25, Steven Butler sebut...@gmail.com wrote:
 I ended up reenabling graphite and setting the stdext to be std, and
 commenting out the namespace alias.

I'm getting a lot further into the build now, but have encountered a
problem with what looks like dotnet bindings?

I keep getting climaker.exe has stopped working popups.

And after a while of dismissing these, saxparser goes into an infinite loop.

This is in the cli_ure module.

Does anyone have any ideas... I've had enough for tonight.

Entering /home/Steve/libre/libo/cli_ure/unotypes

:  PATH=${PATH+${PATH}:}/home/Steve/libre/libo/solver/330/wntmsci12.pro/bi
n ../wntmsci12.pro/bin/climaker  \
--out ../wntmsci12.pro/bin/cli_uretypes.dll \
--keyfile ../wntmsci12.pro/bin/cliuno.snk \
--assembly-version 1.0.7.0 \
--assembly-description This assembly contains metadata for the StarOffi
ce/OpenOffice.org API. \
--assembly-company OpenOffice.org \
C:/cygwin/home/Steve/libre/libo/solver/330/wntmsci12.pro/bin/udkapi.rdb

 error: .NET exception occurred: System.AccessViolationException: Attempted to
read or write protected memory. This is often an indication that other memory is
 corrupt.
   at cppu.bootstrap_InitialComponentContext(Referencecom::sun::star::uno::XCom
ponentContext* , Referencecom::sun::star::registry::XSimpleRegistry* , OUStri
ng* )
   at ?A0x04292f9a.sal_main()
 dying abnormally...
-- 
Regards,
Steven Butler
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] Problems with windows build

2011-02-04 Thread Steven Butler
Hi,


2011/2/4 Jesús Corrius je...@softcatala.org:
 On Fri, Feb 4, 2011 at 2:18 PM, Steven Butler sebut...@gmail.com wrote:
 I keep getting climaker.exe has stopped working popups.

 If you are building it with Visual Studio 2010, it's a known and
 unfortunate issue.


I just realised I have VS2010 Express and vs2008 Express C# editions
installed.  It could well be using the vs2010 instance for C#.

I only have VS2008 Express C++ edition installed.  How much effort is
likely required to get it working with 2010, or should I just work on
getting it to use 2008?

Regards
Steven Butler
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


[Libreoffice] Difficulty building VCL on win32

2011-02-03 Thread Steven Butler
I was unable to build VCL because of an error with Graphite.  I
couldn't see any way to disable graphite in the configure script so I
ended up hard-coding an #undef ENABLE_GRAPHITE into the files that
failed.

Here's the errors I get (same error from winlayout.cxx too).  Anyone
know what's wrong with my build that causes this?

Compiling: vcl/win/source/gdi/salgdi3.cxx
C:/cygwin/home/Steve/libre/libo/solver/330/wntmsci12.pro/inc\graphite/WinFont.h(32)
: error C2386: 'stdext' : a symbol with this name already exists in
the current scope
C:/cygwin/home/Steve/libre/libo/solver/330/wntmsci12.pro/inc\graphite/WinFont.h(215)
: error C2039: 'hash_map' : is not a member of 'stdext'
...
dmake:  Error code 2, while making '../../../wntmsci12.pro/slo/salgdi3.obj'
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] Should the Thesaurus/mythes use a precomputed index (installer file size)

2011-02-02 Thread Steven Butler
in moz\zipped

I found some info on the openoffice.org wiki. The next hurdle seems to
be getting the MS SDK.  I'm d/l the win7 version as its smaller and
hopefully okay but it is looking like it will take all night.

I might have this working in a couple of weeks at this rate.

Also, I noticed some oddities.

checking size of long... 0!!

config kept picking up /usr/bin/csc.exe which seems to be some kind of
scheme interpreter.  I just pulled in all the cygwin dev tools so I
guess I ended up with it.  I renamed the file something else and it is
now picking up the DotNet version.

It also appears that when using VCExpress that ATL and COM is out so
some features will go missing, presumably.

Regards
Steven Butler
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] Should the Thesaurus/mythes use a precomputed index (installer file size)

2011-02-02 Thread Steven Butler
Hurray,

I have finally got past bootstrap phase and I'm leaving it build in
the background today while I'm at work.

I had a number of small issues that I had to resolve, including being
unable to execute some of the installers that were downloaded.  A
chmod 755 src/*.exe src/*.EXE seemed to resolve that, but a couple of
other issues too.

I will send an update tonight if all goes well.

Regards,
Steven Butler
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] Should the Thesaurus/mythes use a precomputed index (installer file size)

2011-02-02 Thread Steven Butler
Sorry Tor - forgot to reply-all and sent only to you previously...
resending to the list.


On 3 February 2011 10:35, Steven Butler sebut...@gmail.com wrote:

 I will send an update tonight if all goes well.

It seems to have failed building VCL - there is an error stating
f268: Error: The image(s) check ... could not be found. (my elision
as different PC)

Could this have something to do with removing icons that was done
recently?  I may have to wait till tonight and try to update the git
checkout(s) to see if that helps.

I also had errors building in a number of other subprojects that I'll
need to look into tonight.

Regards,
Steven Butler
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] Should the Thesaurus/mythes use a precomputed index (installer file size)

2011-02-01 Thread Steven Butler
On 1 February 2011 07:53, Tor Lillqvist tlillqv...@novell.com wrote:

 With the clarification that it is the *Cygwin* command line, yes.

 seems I already have gnu make in my path on windows from the mingw

 Nah, that is not usable for this. It must be the Cygwin make that is used for 
 this (and other Cygwin tools, as described in the wiki). (The Cygwin make is 
 as such not used for much in the LO build process, just at the very top 
 level. For the rest LO's own dmake is used.) And to avoid any possibility 
 of confusion, make sure your non-related development environment(s) don't 
 show up in any environment variables (PATH, LIBS, etc) in the LO build 
 environment.

 --tml

Ok, I've not done any more work on developing this as I have been
working on getting a win32 (actually 64 bit win 7) build environment
working tonight.

I haven't got very far but I will try to note the steps I've taken as
I go.  I'm currently going to have to give up for the night as it is
complaining about the MozillaBuildSetup tools and it's 79 MB and
coming down from ftp.mozilla.com at dialup speed :(

I'm very new to git so I gave up on --reference and just did a
straight clone from my SMB share, which was relatively quick, but of
course I only got the bootstrap.  Once I get past bootstrap stage, is
the git part going to grab relative to bootstrap or go straight to
libreoffice.org?  I was thinking about manually cloning each of the
repositories in the clone directory if necessary to short circuit
this.

Here's my steps so far:

1. Install Cygwin - pick all development tools and install (much later)
2. Clone the bootstrap git project from SMB share and copied the src files.
3. In Cygwin shell, the autogen failed with an odd error related to
Native programs and symlinks.  I got past this by doing the following:
cd /bin
rm /usr/bin/awk
cp /usr/bin/gawk.exe awk.exe
cp /usr/bin/gzip.exe gunzip.exe
4. After this, it seemed to pick my MSVC2008 Express install as the
compiler (I also had several cygwin gcc versions installed but it
seems to have ignored them), then I needed to add the jdk 6 home to
the config option
./autogen.sh --with-jdk-home=/cygdrive/c/Program\ Files/Java/jdk1.6.0_18/
5. I now find I need the mozilla build tools and to add another config option
Download 
http://ftp.mozilla.org/pub/mozilla.org/mozilla/libraries/win32/MozillaBuildSetup-Latest.exe
(very slow 2 hour download :( ) will install it in the morning if
it's finished downloading...

I'll keep adding to this list in case it helps someone else out.

-- 
Regards,
Steven Butler
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] Should the Thesaurus/mythes use a precomputed index (installer file size)

2011-02-01 Thread Steven Butler
Hi,

On 1 February 2011 22:26, Michael Meeks michael.me...@novell.com wrote:
 Hi Steve,

 3. In Cygwin shell, the autogen failed with an odd error related to
 Native programs and symlinks.  I got past this by doing the following:
     cd /bin
     rm /usr/bin/awk
       cp /usr/bin/gawk.exe awk.exe
       cp /usr/bin/gzip.exe gunzip.exe

        Urk; I guess we should try to patch/fix our autogen.sh to work more
 nicely - or is this unavoidable ?

It says its because non-cygwin programs (native Windows) can't execute
them - I have no idea where they are used (or if they are used) so I
followed some hints off the net to make it stop complaining :)

 4. After this, it seemed to pick my MSVC2008 Express install as the
 compiler (I also had several cygwin gcc versions installed but it
 seems to have ignored them), then I needed to add the jdk 6 home to
 the config option
     ./autogen.sh --with-jdk-home=/cygdrive/c/Program\ Files/Java/jdk1.6.0_18/

After finally installing mozilla-build with the following steps:

7. Rerun autogen:
./autogen.sh --with-jdk-home=/cygdrive/c/Program\
Files/Java/jdk1.6.0_18/
--with-mozilla-build=/cygdrive/c/mozilla-build/

configure: error: Building SeaMonkey is supported with Microsoft
Visual Studio 2005 only.
8. I downloaded prebuilt seamonkey from here:
http://tools.openoffice.org/moz_prebuild/OOo3.2/
grabbed 3 files started with WNT but I'm not sure what to do with 
them...

Where should I put these to make it all go?

Regards
Steven Butler
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] building index at Win32 install time ...

2011-01-31 Thread Steven Butler
On 1 February 2011 06:19, Tor Lillqvist tlillqv...@novell.com wrote:
 OK, so on Windows it then uses the stlport library shipped with LO (if at 
 all; I guess it also is possible that all STL code it uses comes from 
 headers?)

On Windows why can it not use the native STL available there?  Is it
because you need to be able to build with non-MS compilers which don't
come with appropriate redistributable libraries?

Anyway, here's what it links to on Linux (Ubuntu 10.10):
$ ldd idxdict
linux-vdso.so.1 =  (0x7fff539a3000)
libpthread.so.0 = /lib/libpthread.so.0 (0x7f919f183000)
libstdc++.so.6 = /usr/lib/libstdc++.so.6 (0x7f919ee7d000)
libm.so.6 = /lib/libm.so.6 (0x7f919ebf9000)
libgcc_s.so.1 = /lib/libgcc_s.so.1 (0x7f919e9e3000)
libc.so.6 = /lib/libc.so.6 (0x7f919e66)
/lib64/ld-linux-x86-64.so.2 (0x7f919f3c9000)

Regards
Steven Butler
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] Should the Thesaurus/mythes use a precomputed index (installer file size)

2011-01-31 Thread Steven Butler
Hi Michael

On 1 February 2011 01:17, Michael Meeks michael.me...@novell.com wrote:
 Hi Steve,

        Sure - so; in response to user input I suspect we can take a second to
 parse the thesaurus; we have around 20Mb of text to load for en_US;
 perhaps 32Mb is a reasonable upper-bound; it does seem a lot to parse so
 quickly.

Where it will hurt is if it is not in cache and the user has some
background task running that hits the disk.

An example might be on Windows with virus scanning (or viruses :) ).

        Right. I think we could easily serialize a small skip-list to disk too
 - if we simply store ~8 or ~32 or so indexes into the data - we can
 parse only a fraction of it, and pop that in our home directory. We
 could also drop the MyThes code too as a depedency to manage.

I'm not sure what you mean by a skip list unless you simply mean a
similar file to the existing .idx, or just a list of offsets for where
the words are to skip loading the whole file.  The trouble with that
approach is the readahead will likely pull in the whole file anyway as
the words aren't generally _that_ far apart in it, so you'll still do
all the IO and just skip a bit of the CPU time.


        The code using it is in:

        lingucomponent/source/thesaurus/libnth/nthesimp.cxx

 BTW, if I did that I'd probably do some major surgery on mythes and
 just use STL because it basically is doing C style memory management
 and processing and I think I would screw it up if I started messing
 with it.  The only problem with simplifying it with STL constructs is
 that I would want to change the interface (string vs char *), maybe
 use STL vectors for the list of synonyms, etc.

        Heh; sure.

I've cooled off on this a bit as performance is slower when using lots
of strings etc.  I was able to change the approach to loading the idx
to treat it as a big buffer and sped it up considerably too.  This did
mean resorting to lots of pointer tomfoolery but it is easy to cleanup
as there are only 3 allocations instead of 100k+ worth.

        I guess we could re-write it inside lingucomponent then (?) but we
 should prolly get a better understanding of how frequently this code is
 called first - is it hooked into from the spell checking code ? or is it
 really just the Tools-Language-Thesaurus ?

It's actually hooked into the right click menu (probably amongst other
things).  The first time you right click on a word, the dictionary for
the current locale is loaded before the right click menu shows up.
After that, it uses the cached thesaurus dictionary for subsequent
lookups.

If you look in your right-click menu, you'll notice a thesaurus list
of synonyms shows up (assuming the word is found) :).

Regards,
Steven Butler
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] Should the Thesaurus/mythes use a precomputed index (installer file size)

2011-01-31 Thread Steven Butler
On 1 February 2011 06:30, Caolán McNamara caol...@redhat.com wrote:
 FWIW, I'm sure Nemeth would be interested if you e.g. wanted to create a
 reimpl of mythes that was faster than the original and perhaps simply
 designate the optimized version the new mythes version with an API/ABI
 change :-)

I don't think there is any need for an API or ABI change as I'm shying
away from an STL reimplementation.  If optimisation is desired
(probably not needed), reducing the string allocations by reading in
the whole index file certainly helps (I cut down from 0.046 seconds
with hot-cache to 0.019 seconds with hot cache to load the US
dictionary.  The speedup is similar on cold cache but I can't recall
the numbers exactly - something like 0.1 seconds down to 0.05 seconds.

I thought it would be possible to use the STL algorithms to do the
binary search and/or use the map, but using all those strings and a
map take considerably longer than all the strdups in the original (I
recall about 0.08 seconds to load the index using STL map.  I didn't
measure lookup time but it would be very similar.

Using STL vectors made it comparable, but then it turns out
binary_search only tells you if an item exists, not its index which is
kind of annoying. :)

So at this point I think an STL rewrite would not result in a
performance improvement, so would be an academic exercise.

Regards,
Steven Butler
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] Should the Thesaurus/mythes use a precomputed index (installer file size)

2011-01-31 Thread Steven Butler
On 31 January 2011 23:14, Tor Lillqvist tlillqv...@novell.com wrote:
 (Hmm, was this message intentionally not to the list?)

An accident, list ccd.

 So after looking at the wiki I wasn't able to find any instructions on
 how one would go about building the Windows installer.  Are the
 instructions the same as for other platforms?

 Yes and no; if you mean some make install or make dev-install, those 
 don't make sense. (I don't even know what they might do in a Windows 
 LibreOffice build environment.)

 Just doing a normal build successfully on Windows, you end up with an 
 (MSI-based) installer. And if you happen to have NSIS on the machine 
 (optional), also a NSIS wrapper of that, a single executable.

 What toolchain needs to be installed?  Is it cygwin, mingw, or MSVC?

 Cygwin and MSVC2008 or MSVC2010. The Express editions are supposed to work, I 
 think.

I could use MSVC2008 express which I already have installed.  Would
the build work over an SMB share?  I don't really want to redownload
the whole lot (bandwidth is limited on Australian broadband plans) -
so failing doing it over SMB, would copying my existing git repos over
to the Windows machine allow an attempt at a build without too much
breakage?  There's obviously a lot of linux product already in that
build tree.

 And a bunch of other dependencies, but I think their download should now be 
 nicely automated, at least in master. Not 100% automated in the 3-3 branch. 
 (Note that in the 3-3 branch I  think one should not attempt a Windows build 
 in the new way (directly in the directory from the bootstrap repo), but 
 just do it the old way, in the directory from the build repo.)

 Once the toolchain is there, is there a special target for the windows
 installer?

 It gets built in the instsetoo_native module. The actual dmake target name 
 used in its util/makefile.mk is something like openoffice_en-US I think (yes, 
 we should change those openoffice strings there to libreoffice). The MSI 
 installer (.msi and .cab files, setup.exe, and various small other bits) ends 
 up in wntmsci12.pro/LibreOffice/msi/install/native/en-US or somesuch place.

So do I simply type make at the command line under windows?  Hmmm.
seems I already have gnu make in my path on windows from the mingw
Ruby build dev build framework as well.  I wonder if that would work
okay or if I'd need to remove that tool chain from my path to stop
things getting confusing?

Time for work... will hopefully look into this tonight ...


 --tml
-- 
Regards,
Steven Butler
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] RC4 / Windows size analysis ...

2011-01-27 Thread Steven Butler
   Wow - great work :-) I've just pushed this to dictionaries/source in
master, and compiled it there. Still need some tweaks to get it called in
the various dictionaries/ makefiles I suppose - but it is a great start
thanks !

   Licensing wise - I'd like to add the standard LGPLv3+/MPL header to it
(see bootstrap/) but having MIT too is fine if you want.

Correction : license of LGPL/MPL is fine.  I used the wrong M acronym. :)

I will look further at your other comments on the w/e if I have some time.

Cheers
Steve
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [Libreoffice] RC4 / Windows size analysis ...

2011-01-25 Thread Steven Butler
  One idea, can we generate thesaurus idx file during install? That may
  solve few megabytes.

   Oh - right; 4Mb of that - which we can (I assume easily) build at
 install time; I've added that to the spreadsheet, and re-up-loaded it.
 It should be quite fun in fact to re-write the somewhat trivial
 dictionaries/util/th_gen_idx.pl script as a standalone C++ tool - would
 be faster too: it takes ~5 CPU seconds each to index those beasties in
 perl, which would be ~instant in C++.

I have had an attempt at this - code attached, it is dual licensed under
LGPL / MIT although there are no (c) headers in the file (feel free to add
some).

I have no idea how this would be integrated into the build process as I'm
not even sure where
it is called from, but happy if someone wants to take up the challenge
and/or incorporate it
as an installer process.

Here's timing of the CPP version on a Core i5 amd64 generating the
following indices:

libo/clone/libs-extern-sys/dictionaries/ca/th_ca_ES_v3.dat.idx2
libo/clone/libs-extern-sys/dictionaries/cs_CZ/th_cs_CZ_v2.dat.idx2
libo/clone/libs-extern-sys/dictionaries/da_DK/th_da_DK.dat.idx2
libo/clone/libs-extern-sys/dictionaries/de_AT/th_de_AT_v2.dat.idx2
libo/clone/libs-extern-sys/dictionaries/de_CH/th_de_CH_v2.dat.idx2
libo/clone/libs-extern-sys/dictionaries/de_DE/th_de_DE_v2.dat.idx2
libo/clone/libs-extern-sys/dictionaries/en/th_en_US_v2.dat.idx2
libo/clone/libs-extern-sys/dictionaries/fr_FR/thes_fr.dat.idx2
libo/clone/libs-extern-sys/dictionaries/hu_HU/th_hu_HU_v2.dat.idx2
libo/clone/libs-extern-sys/dictionaries/it_IT/th_it_IT_v2.dat.idx2
libo/clone/libs-extern-sys/dictionaries/ne_NP/th_ne_NP_v2.dat.idx2
libo/clone/libs-extern-sys/dictionaries/no/th_nb_NO_v2.dat.idx2
libo/clone/libs-extern-sys/dictionaries/no/th_nn_NO_v2.dat.idx2
libo/clone/libs-extern-sys/dictionaries/pl_PL/th_pl_PL_v2.dat.idx2
libo/clone/libs-extern-sys/dictionaries/ro/th_ro_RO_v2.dat.idx2
libo/clone/libs-extern-sys/dictionaries/ru_RU/th_ru_RU_v2.dat.idx2
libo/clone/libs-extern-sys/dictionaries/sk_SK/th_sk_SK_v2.dat.idx2
libo/clone/libs-extern-sys/dictionaries/sl_SI/th_sl_SI_v2.dat.idx2

real0m0.792s
user0m0.630s
sys 0m0.080s

The same set of files using th_gen_idx.pl took around 5 seconds (although
some basic fixups got it done to 3.5 seconds).

What I have noticed while testing the change was that a lot of the
dictionaries I processed have errors.

These range from having the entry count incorrect, causing the index
process to miss a word (lots of these in some dictionaries), to having
words apparently duplicated either as the next entry, or sometimes a long
way apart.

I have not attempted to fix these dictionary issues, but if they are
serious it might be worth having a perl script that is able to validate
the dictionaries are internally consistent.  Unfortunately, it would have
to
use heuristics as the file format makes it difficult to tell in general
what kind of line is being processed.

The CPP version attached has a difference from the perl script in that
when multiple entries are found, they appear to be coming out in reverse
order to the original perl script.  What I'm curious about is what impact
Having multiple entries for a word when loaded into libreoffice?

For reference I have attached an improved perl version of the perl script
that runs a couple of seconds faster than the original.  I had three to
four versions in my tree but changing none of them triggered a git diff to
show the changes so I've attached the full copy.

Cheers
Steve.

#include iostream
#include fstream
#include string
#include map
#include stdlib.h
#include string.h

static const int MAXLINE = 1024*64;

using namespace std;

int main(int argc, char *argv[])
{
if (argc != 3 || strcmp(argv[1],-o))
{
cout  Usage: th_gen_idx -o outputfile  input\n;
::exit(99);
}
// This call improves performance by approx 5x
cin.sync_with_stdio(false);

const char * outputFile(argv[2]);
char inputBuffer[MAXLINE];
multimapstring, size_t entries;
multimapstring,size_t::iterator ret(entries.begin());

int line(1);
cin.getline(inputBuffer, MAXLINE);
const string encoding(inputBuffer);
size_t currentOffset(encoding.size()+1);
while (true)
{
// Extract the next word, but not the entry count
cin.getline(inputBuffer, MAXLINE, '|');

if (cin.eof()) break;

string word(inputBuffer);
ret = entries.insert(ret, pairstring, size_t(word, 
currentOffset));
currentOffset += word.size() + 1;
// Next is the entry count
cin.getline(inputBuffer, MAXLINE);
if (!cin.good())
{
cerr  Unable to read entry - insufficient 
buffer?.\n;
exit(99);
}
currentOffset +=