[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]

2017-07-28 Thread Kevin Funk
https://bugs.kde.org/show_bug.cgi?id=363269

Kevin Funk  changed:

   What|Removed |Added

   Version Fixed In||5.1.2

-- 
You are receiving this mail because:
You are watching all bug changes.

[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]

2017-07-28 Thread Sven Brauch
https://bugs.kde.org/show_bug.cgi?id=363269

Sven Brauch  changed:

   What|Removed |Added

  Latest Commit|54c2740ae97f4e3ee7675ff06ec |https://commits.kde.org/kde
   |5a6dde43c6af4   |velop/3fde9551d5870229d42cc
   ||1a61f152fc8d0876b71
 Status|CONFIRMED   |RESOLVED
 Resolution|--- |FIXED

--- Comment #33 from Sven Brauch  ---
Git commit 3fde9551d5870229d42cc1a61f152fc8d0876b71 by Sven Brauch, on behalf
of Axel Kellermann.
Committed on 27/07/2017 at 22:28.
Pushed by brauch into branch '5.1'.

cmListFileLexer: use memcpy instead of strcpy to copy buffer contents to tokens

This fixes a crash when the input is garbage (e.g. binary file which
is not cmake code at all).
Differential Revision: https://phabricator.kde.org/D6924

M  +7-3projectmanagers/cmake/parser/cmListFileLexer.c

https://commits.kde.org/kdevelop/3fde9551d5870229d42cc1a61f152fc8d0876b71

-- 
You are receiving this mail because:
You are watching all bug changes.

[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]

2017-07-26 Thread Axel Kellermann
https://bugs.kde.org/show_bug.cgi?id=363269

--- Comment #32 from Axel Kellermann  ---
(In reply to Sven Brauch from comment #31)
> Ah yes, if you trick the mimetype detection, you will still crash. You are
> also right about strncpy.
> 
> phabricator should be simple enough to use, just log in (with
> identity.kde.org), click differential, then click new diff and paste your
> diff.

OK, I added the diff to phabricator:
https://phabricator.kde.org/D6924

-- 
You are receiving this mail because:
You are watching all bug changes.

[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]

2017-07-25 Thread Sven Brauch
https://bugs.kde.org/show_bug.cgi?id=363269

--- Comment #31 from Sven Brauch  ---
Ah yes, if you trick the mimetype detection, you will still crash. You are also
right about strncpy.

phabricator should be simple enough to use, just log in (with
identity.kde.org), click differential, then click new diff and paste your diff.

-- 
You are receiving this mail because:
You are watching all bug changes.

[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]

2017-07-25 Thread Axel Kellermann
https://bugs.kde.org/show_bug.cgi?id=363269

--- Comment #30 from Axel Kellermann  ---
(In reply to Sven Brauch from comment #28)
> Patch still looks sensible to me -- maybe use strncpy instead, and put it on
> phabricator.kde.org so somebody familiar with the code can have a look?

strncpy()/strndup() won't work either. Both copy up to n bytes, but also end
processing on the occurence of the first '\0' (at least to my understanding).

I'll have a look at phabricator. Never used it before...

-- 
You are receiving this mail because:
You are watching all bug changes.

[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]

2017-07-25 Thread Axel Kellermann
https://bugs.kde.org/show_bug.cgi?id=363269

--- Comment #29 from Axel Kellermann  ---
(In reply to Sven Brauch from comment #27)
> Does that still happen with KDevelop 5.1+? I thought I fixed it.

See comment #22. Random .txt files aren't scanned anymore, but .cmake files can
still trigger the faulty behaviour. I conducted my tests with cmListFileLexer.c
from master (last updated beginning of the week). But I extracted it into a
small reproducer application to ease debugging, so if you fixed it somewhere in
the cmake parser outside of cmListFileLexer.c, maybe we don't have to do
anything.

-- 
You are receiving this mail because:
You are watching all bug changes.

[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]

2017-07-25 Thread Sven Brauch
https://bugs.kde.org/show_bug.cgi?id=363269

--- Comment #28 from Sven Brauch  ---
Patch still looks sensible to me -- maybe use strncpy instead, and put it on
phabricator.kde.org so somebody familiar with the code can have a look?

-- 
You are receiving this mail because:
You are watching all bug changes.

[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]

2017-07-25 Thread Sven Brauch
https://bugs.kde.org/show_bug.cgi?id=363269

--- Comment #27 from Sven Brauch  ---
Does that still happen with KDevelop 5.1+? I thought I fixed it.

-- 
You are receiving this mail because:
You are watching all bug changes.

[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]

2017-07-25 Thread Axel Kellermann
https://bugs.kde.org/show_bug.cgi?id=363269

--- Comment #26 from Axel Kellermann  ---
I had another look at the lexer and I think I pinned down the problem with
UTF-16 files in cmListFileLexer.c. Scanning the files works fine, but copying
the scanned content into the token structure in cmListFileLexerSetToken() is
where things go wrong. The code that copies the content uses the functions
strcpy() and strdup() that are meant to be used only with zero-terminated
character arrays. As we handle two-byte characters, where the most significant
byte can be zero (see e.g. letter 'h' with 0x0068), it's possible that the text
to be copied to token->text contains zero-bytes mid-string. In that case
strdup() doesn't do what it's intended to do. It interprets the buffer as
zero-terminated string and only duplicates it up to the first occurence of
'\0'. At the same time the original buffer size is stored in token->length.
This leads to out-of-bounds memory accesses later on.

I attached a simple UTF-16 file that reliably triggers the problem for me
(363269_repro.txt) and a proposal for a fix that replaces the string functions
with mallocs/memcpys (363269_proposal.patch). Maybe someone with more
experience with the cmake parser/lexer could have a look at it.

Related questions: The patch fixes the crashes/ASAN aborts for me, but is the
cmake parser really handling UTF-16 files correctly? Functions like
cmListFileLexer_BOM() imply that it can handle all kinds of UTF formats, but at
the same time the whole code seems to imply that it only works on zero
terminated char arrays. Do we possibly need to update to a newer version of the
lexer (which is generated from external sources, right?)?

-- 
You are receiving this mail because:
You are watching all bug changes.

[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]

2017-07-25 Thread Axel Kellermann
https://bugs.kde.org/show_bug.cgi?id=363269

--- Comment #25 from Axel Kellermann  ---
Created attachment 106866
  --> https://bugs.kde.org/attachment.cgi?id=106866=edit
Proposal for patch

-- 
You are receiving this mail because:
You are watching all bug changes.

[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]

2017-07-25 Thread Axel Kellermann
https://bugs.kde.org/show_bug.cgi?id=363269

--- Comment #24 from Axel Kellermann  ---
Created attachment 106865
  --> https://bugs.kde.org/attachment.cgi?id=106865=edit
Minimal UTF-16 file to trigger behavior

-- 
You are receiving this mail because:
You are watching all bug changes.

[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]

2016-11-17 Thread Kevin Funk
https://bugs.kde.org/show_bug.cgi?id=363269

Kevin Funk  changed:

   What|Removed |Added

 CC||asiasuppenes...@gmx.de

--- Comment #23 from Kevin Funk  ---
*** Bug 372575 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are watching all bug changes.

[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]

2016-11-17 Thread Kevin Funk
https://bugs.kde.org/show_bug.cgi?id=363269

Kevin Funk  changed:

   What|Removed |Added

Summary|Crash when projects |Crash when projects
   |contains *.txt file that is |contains *.txt file that is
   |actually a binary file  |actually a binary file
   |[cmListFileLexerSetToken]   |[cmListFileLexerSetToken,
   ||CMakeListsParser::readCMake
   ||File]

-- 
You are receiving this mail because:
You are watching all bug changes.

[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken]

2016-10-13 Thread Sven Brauch via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=363269

--- Comment #22 from Sven Brauch  ---
Note that while above commit should fix the crash in this case, this bug is
still open, since it would be back if you would rename the file to
"CMakeLists.txt" ... I think.

-- 
You are receiving this mail because:
You are watching all bug changes.


[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken]

2016-10-13 Thread Sven Brauch via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=363269

--- Comment #21 from Sven Brauch  ---
Git commit 77b83054f943b6c9bce0da178732f7992f7ada3b by Sven Brauch.
Committed on 13/10/2016 at 18:52.
Pushed by brauch into branch '5.0'.

Remove mime type <-> extension cache

The idea that because one file with extension X has mime type A,
determined by its contents, hence other files with extension X will
have the same mime type is just wrong. One common example where this
breaks in a spectacular way is CMakeLists.txt and the .txt extension.

I found the claim that looking into each file will make the application
unresponsive to be unfounded. QMimeType will only read the first 16kB to
guess the mime type, which takes less than a millisecond for each file.
A test project with three hundred 3 MB binary blobs still loads instantly.
If, in comparison, we parse one of the files as CMake erraneously, we
take multiple seconds.
Differential Revision: https://phabricator.kde.org/D3042

M  +1-35   shell/languagecontroller.cpp

http://commits.kde.org/kdevplatform/77b83054f943b6c9bce0da178732f7992f7ada3b

-- 
You are receiving this mail because:
You are watching all bug changes.


[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken]

2016-10-01 Thread Sven Brauch via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=363269

Sven Brauch  changed:

   What|Removed |Added

 CC||m...@svenbrauch.de

--- Comment #20 from Sven Brauch  ---
Thanks for looking. You are indeed right -- .txt files are treated as cmake in
some cases. I also poked around a bit, when it does not find a language for a
file it looks at the contents, guesses the mime type, and caches that. So when
you open CMakeLists.txt, after that all .txt files will be treated as CMake.
That is a bug, and needs to be fixed. The relevant code is, I think, in
languagecontroller.cpp:310ff.

-- 
You are receiving this mail because:
You are watching all bug changes.


[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken]

2016-10-01 Thread Axel Kellermann via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=363269

--- Comment #19 from Axel Kellermann  ---
I had some time to kill today, so I took another look at why random .txt files
get processed by the background parser. If I understand the code flow
correctly, it happens like this:
- on startup the project folder is recursively scanned for all contained files
- every found file is wrapped in a ProjectFileItem
(abstractfilemanagerplugin.cpp:234) and added to the projects file set
- the BackgroundParser then creates a ParseJob for every ProjectFileItem (in
BackgroundParser::parseDocumentsInternal())
- in BackgroundParser::createParseJob(), the language to use for the current
ProjectFileItem is queried by a call to
m_languageController->languagesForUrl(qUrl)
- LanguageController::languagesForUrl() in turn checks an internal
mimeTypeCache and fileExtensionCache for the language to use for the given file
- in my case, the fileExtensionCache contains an entry that resolves the
extension txt to language CMake
- that means in backgroundparser.cpp:357 I get back a CMakeManager, that is
then used to create a cmake parse job

Is that expected behaviour or some side effect that has to be fixed? I'm asking
this because there are multiple cmake specific spots in the code that
explicitly filter for "CMakeLists.txt" and "*.cmake", and that are circumvented
by this implicit behaviour.

-- 
You are receiving this mail because:
You are watching all bug changes.


[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken]

2016-09-09 Thread Night Nord via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=363269

Night Nord  changed:

   What|Removed |Added

 CC||nightn...@gmail.com

--- Comment #18 from Night Nord  ---
The crash still happens on todays (09.09.2016) 5.0 branch with UTF-16 txt
files. ASAN trace is the same as the one already attached. Without asan it will
crash 100% times either during parsing or when saving the cache, so it pretty
much blocks any work with such projects.

Strangely enough, I don't see why it's trying to parse random *.txt files (in
my case it was 'utf-16.txt' file from reviewboard testdata) - a quick glance at
the code says it should only try to parse 'CMakeFiles.txt' or '*.cmake'.

-- 
You are receiving this mail because:
You are watching all bug changes.


[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken]

2016-09-08 Thread Kevin Funk via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=363269

--- Comment #17 from Kevin Funk  ---
Some comments from the dup:

"""
I found the problem, and it's actually similar, but not the same as bug 363269.
The code base being parsed contains a subfolder with example text in a couple
of different asian languages, all stored as .txt files in UTF16LE. The cmake
parser seems to not just read CMakeLists.txt and *.cmake, but also generally
all files with the extension .txt. Apparently the encoding of the example files
is handled correctly, but then the wide characters are given to the lexer that
expects chars, hence the one byte buffer overflow reported (just my guess,
didn't have a look at all the code involved).

If I delete all the UTF16 files, the whole code base is processed successfully.
"""

-- 
You are receiving this mail because:
You are watching all bug changes.


[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken]

2016-09-08 Thread Kevin Funk via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=363269

Kevin Funk  changed:

   What|Removed |Added

 CC||axel.kellerm...@gmx.de

--- Comment #16 from Kevin Funk  ---
*** Bug 367841 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are watching all bug changes.


[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken]

2016-09-08 Thread Kevin Funk via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=363269

Kevin Funk  changed:

   What|Removed |Added

Summary|Crash when projects |Crash when projects
   |contains *.txt file that is |contains *.txt file that is
   |actually a binary file  |actually a binary file
   ||[cmListFileLexerSetToken]

-- 
You are receiving this mail because:
You are watching all bug changes.